Re: How to view bayesian database in legible text

2017-11-13 Thread Michael Parker


> On Nov 9, 2017, at 10:15 AM, Emanuel  wrote:
> I am interested in seeing the bayes info in the database, because it was 
> created years ago
> 
> 

There does exist a plugin, that allows you to fill in the actual text for the 
hashed value.

https://wiki.apache.org/spamassassin/UnmaintainedCustomPlugins 


CollectTokens.pm - Unfortunately it doesn’t seem to be there anymore and my 
googling foo didn’t find any traces, maybe you’ll have better luck.

Here is the original thread where someone basically asks the same question as 
you.

http://spamassassin.1065346.n5.nabble.com/getting-Bayes-token-data-from-spamassassin-td30092.html
 


As others have said, it’s impossible to reverse the hash, the plugin just fills 
in the value once it figures it out.

Michael




Re: bayes sql: bayes_seen needs UPDATE

2017-06-24 Thread Michael Parker
Jesse,

Thanks for the report.  For sure get this into Bugzilla once you get the 
account setup.

Please make sure you include which version of MySQL you are running as well.

The Bayes SQL stuff hasn’t been updated in many many years, it might be that 
MySQL changed the permissions for INSERT on DUPLICATE KEY UPDATE to require 
UPDATE as well, this is just a theory.

Michael

> On Jun 22, 2017, at 1:49 PM, Jesse Norell  wrote:
> 
> Hello,
> 
> I'm working on converting a spam training script/setup which works with
> bayes dbm files to support sql bayes, and came across an error in the
> grants in the README.bayes file at:
> 
>  GRANT SELECT, DELETE, INSERT ON TABLE bayes_seen TO ;
> 
> I'm using the MySQL driver (maybe it matters), and UPDATE permission is
> needed on bayes_seen to avoid:
> 
>write(6, "\257\0\0\0\3INSERT INTO bayes_seen (id, msgid, flag)\n   
>   VALUES 
> ('2','2d74cc15f332ac5a1789ac7d979ef9320ac98d80@sa_generated','s')\n\t ON 
> DUPLICATE KEY UPDATE flag=VALUES(flag)", 179) = 179
>read(6, "X\0\0\1\377v\4#42000UPDATE command denied to user 
> 'spamassassin'@'localhost' for table 'bayes_seen'", 16384) = 92
> 
> I never did see any error printed by sa-learn on that, I just happened
> to catch it in tracing sa-learn to see what takes so long.  After
> granting UPDATE permission I see a few quirks with bayes_seen disappear,
> where re-learning the same message shows an increase in nspam or nham
> count (and entries in bayes_seen are duplicated), where using dbm files
> showed the counts stayed the same.  I was hoping for a performance
> improvement too, but not seeing much change there yet (though I don't
> have much of a baseline on this new system).
> 
> I'm running 3.4.1-6~bpo8+1 from jessie-backports, but README.bayes is
> the same:
> https://svn.apache.org/repos/asf/spamassassin/trunk/sql/README.bayes
> 
> 
> Thanks,
> Jesse
> 
> 
> (I've been waiting a few hours on a bugzilla email so haven't yet added
> this to the bug tracker.)
> 
> 
> -- 
> Jesse Norell
> Kentec Communications, Inc.
> 970-522-8107  -  www.kci.net
> 



Re: Shortcircuit work partially

2016-08-30 Thread Michael Parker
Apologies if any of this ends up being not so up to date.  It’s been ages since 
the Check plugin was written.

You’re most likely going to want to write your own Check plugin to change this 
behavior.

If you look at check_main you’ll see the DNS based tests get fired off before 
the priority loop.

Others who needed to prioritize short circuit logic have done this as well.

Michael

> On Aug 30, 2016, at 10:26 AM, Nicola Piazzi  
> wrote:
> 
> How to do it syncronously ?
> It is not important to process a single mail in 5 or 50 seconds
> 4 me ss most important to reduce load
> 
> 
> 
> -Messaggio originale-
> Da: RW [mailto:rwmailli...@googlemail.com] 
> Inviato: martedì 30 agosto 2016 17:24
> A: users@spamassassin.apache.org
> Oggetto: Re: R: R: Shortcircuit work partially
> 
> On Tue, 30 Aug 2016 14:48:03 +
> Nicola Piazzi wrote:
> 
>> em is that dns check are made asincronously if it will be made 
>> sincronously it will happen like you said it is not important slowind 
>> down all messages because I save a lot of query and cpu
> 
> Running then synchronously would mean running them consecutively. What I 
> think you want would involve running them asynchronously, but starting them 
> later. 
> 
> This would reduce dns lookups, but scans would generally take longer.



Re: Perl regex compilation in spamd

2013-07-09 Thread Michael Parker

On Jul 9, 2013, at 9:18 AM, RW  wrote:

> 
> I was wondering when perl REs are compiled in spamd. My understanding
> is that perl compiles REs when they are first evaluated. Is anything
> done to make this happen in the spamd parent? 

Yes, on startup a fake message is scanned to prime the pump and get everything 
loaded.

Michael



Re: Bayes_vars records on MySQL not created automatically

2013-05-08 Thread Michael Parker

On May 8, 2013, at 8:06 AM, Matteo Dessalvi  wrote:

> 
> I always thought that SA would be able to operate autonomously and that it 
> will create the
> proper records in all the tables of the DB. Am I missing something? Is this 
> the designed behavior?
> 

It's been awhile since I wrote and looked at the code, but I'm pretty sure that 
the bayes_var entry won't be created until you learn something as that user.

Try doing an sa-learn or an auto-learn for that user and see what happens.

If memory serves the behavior was deliberate so that you wouldn't get hundreds 
of entries in bayes_var when messages are checked for users who may not be real.

Michael




Re: how decode tokens's column

2012-05-11 Thread Michael Parker

On May 11, 2012, at 8:34 AM, Jacopo Fabiani wrote:
> 
> 
> My question is: where do I get wrong? Is there a way to decode encoded token 
> that I got with sa-learn --backup command?
> 

No, there is no way to decode the bayes tokens.

Search this mailing list several years ago for possible work arounds to get 
what you want.

Michael




Re: Is there a header size limit on To: header rules?

2012-04-20 Thread Michael Parker

On Apr 20, 2012, at 12:17 PM, cyboc wrote:

> 
> We have been receiving 419 spam with extremely long lists of email addresses
> in the To: header. I'm talking hundreds of addresses. 
> 
> I've noticed that the same email addresses keep on appearing near my user's
> email address in the To: header. I have created a rule to check for those
> particular email addresses but unfortunately, it seems that SpamAssassin (at
> least my configuration) has some limit on how far it will check in the To:
> header. If the addresses I'm scanning for are too far down the list of
> addresses, SpamAssassin doesn't seem to detect them. I'm not sure if the
> limit is size based (e.g. 10 KB) or number of addresses based (e.g. 350
> email addresses).
> 
> I'm running version 3.3.1 of SpamAssassin. Is there some setting I can
> change to increase the (apparent) limit?

Pretty sure it's hardcoded at 8k.  This was done several years ago to protect 
against DoS type situations.

Michael


> -- 
> View this message in context: 
> http://old.nabble.com/Is-there-a-header-size-limit-on-To%3A-header-rules--tp33721936p33721936.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
> 



Re: New versions of Perl are slower

2012-04-10 Thread Michael Parker

On Apr 10, 2012, at 4:12 PM, Julian Yap wrote:

> I'm running SpamAssassin 3.3.2 port revision 6 (latest from FreeBSD
> ports) on FreeBSD 8.2-RELEASE 64-bit.
> 
> I recently upgraded my Perl from 5.10 to 5.14 but I needed to
> downgrade because SpamAssassin was crashing on a daily basis.  See
> bug:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6745
> 
> I have since downgraded my servers to Perl 5.10 and Perl 5.12.
> 
> I have noticed that Perl 5.12 runs noticeably slower compared to 5.10.
> I think 5.14 was slow as well.
> 
> Average scan times are higher and there are often more 'longer
> running' scans.  This results in more output on servers running Perl
> 5.12:
> tail -f /var/log/maillog | grep 'identified spam .*[2-9][0-9].[0-9] seconds'
> 
> Have others experienced the same thing?

I think you can back that up and say that anything > 5.8 is slower.  In perl 
5.10 they made major changes to the regex engine which must have added some 
overhead that now slows things down.  I've seen instances, depending on the 
total number of rules running, of 50% slowdowns moving from 5.8 -> 5.10 and 
beyond.

Michael


> 
> - Julian



Re: Spamassassin rules from sql

2012-02-01 Thread Michael Parker

On Feb 1, 2012, at 10:37 AM, Miguel Fernandes wrote:

> Hi!
> 
> I'm just wondering, is there a limitation on the type of rules that can be 
> added to the SpamassAssin rules table?
> Because adding something like (subject scoring):
> 
> header TEST_SUBJECT Subject =~ /test/i (I've tried several combinations of 
> fields: preference & value, with no success)
> score TEST_SUBJECT 30 (this one is possible)
> 
> Does not seem straightforward...
> 
> My goal is to be able to score subjects through sql rules.
> 

Perhaps you can make use of 

loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject

blacklist_subject test
blacklist_subject This is a subject line I want to blacklist

That will cause the msg to get an extra 100 score.

You can of course also whitelist subjects as well, feel free to look over the 
documentation.

whitelist_subject and blacklist_subject are not listed as admin commands so 
they should work just fine in SQL.  You won't be able regexes but you can use 
glob type params.

Michael




Re: A SpamAssassin Crash Course for Admins

2011-11-29 Thread Michael Parker

On Nov 29, 2011, at 9:13 PM, Dorian Chan wrote:

> Hello again,
> I've attached version 2.0 with this email (it's the clean version without all 
> the comments :) ). I've pretty much finished up the definitions and some 
> cleaning up. Again, I would really enjoy feedback!
> 

Everywhere you say "SpamAssassin" you should probably be saying "Apache 
SpamAssassin."

Michael

PS Kevin, this also applies to the listing on the Google Code-In site, is that 
something that can be fixed?




Re: Only running network tests when necessary - feature request

2010-10-29 Thread Michael Parker

On Oct 29, 2010, at 8:42 PM, dar...@chaosreigns.com wrote:

> I'd like to see spamassassin only run network tests when they might
> affect the outcome.

Why?

Assuming a reasonably fast connection network checks are basically free.

They are kicked off at the start of a scan and the results are compared at the 
end.  You're not exactly waiting on network tests to run since all the other 
rules are run in the mean time.


> 
> For example, if you run all non-network tests, and at that point an email's
> score qualifies as spam, and then you run all the non-spam network tests
> (hitting whitelists), and it still qualifies as spam, there's no reason
> to ever run the spam network tests (hitting blacklists).
> 
> Right?
> 
> Perl pseudocode:
> 
> run_non_network_tests();
> NETTEST: while (1) {
>  if ($score < 5) {
>run_one_network_spam_test() or last NETTEST;
>  } else {
>run_one_network_nonspam_test() or last NETTEST;
>  }
> }
> 

Ok, lets assume that this actually buys you something.  Good thing that you can 
provide your own Check.pm.  You can easily provide your own.

Michael


> 
> Of course it would be necessary to have an option to disable this, for
> things like submitting corpora to the mass checks.
> 
> I would probably run the network tests in random order.  But I might have
> more affinity for randomness than average.
> 
> -- 
> "Let's just say that if complete and utter chaos was lightning, then
> he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
> copper armour and shouting 'All gods are bastards'." - The Color of Magic
> http://www.ChaosReigns.com



Re: Writing Spamassassin plugin.

2010-06-18 Thread Michael Parker
These might be starting to get dated a little but I think that if you look at 
the "Extending Apache SpamAssassin Using Plugin" slides and notes from here: 
http://people.apache.org/~parker/presentations/index.html

That will give you a good idea on what you need to accomplish for your plugin.

Michael



Re: segmentation fault on startup

2010-01-10 Thread Michael Parker


On Jan 10, 2010, at 6:58 PM, Robert P. Weaver wrote:


[28414] dbg: replacetags: replacing tags
[28414] dbg: replacetags: done replacing tags
[28414] dbg: bayes: tie-ing to DB file R/O /users/ 
rweaver/.spamassassin/bayes_toks
[28414] dbg: bayes: tie-ing to DB file R/O /users/ 
rweaver/.spamassassin/bayes_seen

Segmentation Fault - core dumped



If I had to guess, I'd say your Berkeley DB libraries had an issue.   
You might try looking at those and seeing if they are corrupt or  
otherwise damaged.  Maybe re-install those.


As a test you can set use_bayes 0 in your local.cf file and see how  
things go.


Michael

PS so interesting, I used to run a Solaris mail server named sauron,  
many many moons ago, I guess it's a common name.




Re: actual facts (was Re: HABEAS_ACCREDITED SPAMMER)

2009-12-04 Thread Michael Parker

FYI, the original bug is here: 
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3998

All the bitching about it, took me about 30 seconds to find it.

Michael



Re: How to log sending IP in spamd

2009-10-05 Thread Michael Parker


On Oct 4, 2009, at 1:46 PM, Steve Fatula wrote:

We use Spamassassin via spamc/spamd via procmail. In the maillog  
file, we see when there is spam, the message indicates a bunch of  
information. raddr shows up always as 127.0.0.1, which is our course  
our connection to SPAMD from our machine via procmail. Similarly,  
rhost is our machine.


We are trying to tally up totals by sending IP of SPAM. So, none of  
the log messages show sending IP when used in this environment.


How can we get spamd to log the sending ip? Alternatives?



Not sure how recent of a version you'll need but in at least 3.3 you  
can write a plugin that calls $permsgstatus- 
>set_spamd_result_item() to add anything to the spamd logline.


Check the Shortcuit plugin for an example.

Michael



Re: Do I need to do anything to maintain MySQL?

2009-10-05 Thread Michael Parker


On Oct 4, 2009, at 8:56 PM, Steven W. Orr wrote:


I did some googling, and the more I read, the more apparent that the
documentation is a little light.

So here are the questions that I think are really the 800 pound  
elephant in

the room:

* If I do set bayes_auto_expire to 0 and I am using MySQL then does  
any run of
sa-learn cause the expired rows of bayes_token to be removed if  
there are no

corresponding rows that relate back to bayes_seen?


No.  Also, I think you're not really understanding how expiry works, I  
suggest you read the Expiration section of the sa-learn man page to  
get a better idea of what is going on.




* If I set bayes_auto_expire to 0, and I am using MySQL then do I  
need to run

a cron job which does this? How often should I run it?

sa-learn --force-expire --sync



Yes, if you turn off auto expire then you will need to come up with  
some way to run it manually.  How often really depends on your mail  
traffic.  However, unless you have a poorly tuned MySQL database or  
underpowered system I would recommend just keeping auto expire turned  
on.  Its very quick to run under SQL and should not slow down your  
usual mail processing.



* I set bayes_sql_override_username to something. If I did not, then  
do I have
to have a cron job as described above that runs as each user that is  
listed in

bayes_vars.username?


Yes, you would have to run it for each user.



* If I set bayes_auto_expire to 1, then does every update of any row  
in the
spamassassin database try to clean up these rows that could be  
removed?




Please read the sa-learn man page for details on how expiration works,  
but what happens is that when a message is scanned by bayes a small  
check is made to see if it needs to happen.  In more recent versions  
of spamd this expiration happens after the result is returned to the  
client so there should be no waiting.


Hope that answers some questions.

Michael



I'm hoping that I'm not ranting. Sorry.

--
Time flies like the wind. Fruit flies like a banana. Stranger things  
have  .0.
happened but none stranger than this. Does your driver's license say  
Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We  
are all- 000

individuals! What if this weren't a hypothetical question?
steveo at syslang.net





Re: Parallelizing Spam Assassin

2009-07-31 Thread Michael Parker


On Jul 31, 2009, at 1:55 AM, poifgh wrote:


I ran freshly build SA with Bayes and DNSBL turned off. Why am I not  
seeing

a linear increase in the throughput? Is a file locking creating the
bottleneck? If yes, which particular file is being locked? If no,  
what could

be the reason for this?


There could be many reasons, check out my talk (admittedly out of date  
a little but should still be mostly relevant) on High Performance  
Apache SpamAssassin at the following link:


http://people.apache.org/~parker/presentations/index.html

Keep in mind that you might also be seeing other factors like memory  
and disk I/O contention.  You don't really spell out your testing  
infrastructure so its not real clear if you're even performing a valid  
test.


Also, I wouldn't necessarily expect to see a linear increase, although  
you might be able to take some easy steps for increasing your overall  
performance.


Michael



Re: URI-DNSBL problem with spamassassin 3.2.5

2009-07-09 Thread Michael Parker


On Jul 9, 2009, at 1:40 PM, Eddy Beliveau wrote:


but Ido not find any timing.log file on my current directory or  
anywhere on my system!!


Did I missed something ?



I doubt all the necessary hooks are in place for that plugin to work  
in 3.2.5, you'd need to run 3.3 to make use of that plugin.


Michael



Re: Plugging dspam into SA

2009-07-09 Thread Michael Parker


On Jul 9, 2009, at 7:07 AM, Frank DeChellis wrote:



If anybody has any advice or ideas, please let me know.



This is probably way beyond what you wanted to get into but the Bayes  
subsystem has plugin hooks so you could write your own dspam plugin to  
use.


I'm not aware of anyone trying it so the plugin interface might need  
some tweaking but I'd certainly be interested in the results.


Michael



Re: lookup user_prefs in SQL database (not using spamc)

2009-03-25 Thread Michael Parker


On Mar 25, 2009, at 9:30 AM, Guido wrote:

I believe it means you should take up this issue with the Amavisd- 
new support forum.


Since you are not RUNNING SpamAssassin/spamc/spamd then some parts  
of the configuration simply are not made effective in your  
situation.  You must look for a solution within the software that  
actually runs in your system, not something where only the  
libraries are used.


I belive you are right.
I thought it might be SA related, but it seems to be a problem of  
how SA

is used by amavisd-new.

Just for documentation some related discussions:
http://www.mail-archive.com/amavis-u...@lists.sourceforge.net/msg02541.html
http://www.engardelinux.org/modules/index/list_archives.cgi?list=amavis&page=0008.html&month=2007-08
http://sourceforge.net/mailarchive/message.php?msg_name=20090325101644.GA26940%40guido-leisker.de





Further documentation:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=1145

Its one of the oldest bugs in the SpamAssassin bugzilla.

Patches are welcome of course.

Michael




Thanks

Guido






Re: Two servers, one database. A question

2009-02-14 Thread Michael Parker


On Feb 14, 2009, at 3:47 PM, Lindsay Haisley wrote:


On Sat, 2009-02-14 at 15:04 -0600, Bob Proulx wrote:

I would bet on Bayes/userpref queries being more efficient than

the

spamc/spamd traffic.


I like that you are asking the question.  But I hate to guess at  
which
is better though.  The weakest benchmark data point is better than  
the
strongest guess.  Too often I have taken my best guess and been  
wrong.

In this case I would guess the opposite would be more efficient, that
the one spamc-spamd connection per message would be more efficient
than the many mysql queries per message, which is why I bring this  
up.


Well that's something to consider.  I had hoped when I subscribed to
this list to ask this question that I'd find people, possibly SA
developers on it, who had benchmarked the options I presented for
decision and could give me some definitive answers based on this,  
but it
appears that this isn't the case.  Instead I've found several people  
of

good will who don't seem to know a whole lot more about SA than I do,
but have given me some good points to think about.

Do you have any idea where I might inquire to get advice from people
with more precise knowledge?



This is the best place.  Its not a common setup so I don't doubt that  
anyone really knows the correct answer.


One data point I'll add is that spamc has a compress mode that might  
be useful (spamc -z).  Also, it would take a little work on your end  
but you can also pass in --headers to further reduce the spamc/spamc  
traffic.  Check out the spamc man page for more info.


One other thing related to MySQL.  I've never personally done it but  
I'm certain there are ways you could use MySQL proxy or perhaps even  
federated tables to manage this sort of thing.  MySQL proxy has lots  
of different functions, I'm sure compression is either one of them or  
at least something that can be easily bolted on.


Michael





--
Lindsay Haisley   | "Everything works|Accredited
FMP Computer Services |   if you let it" |  by the
512-259-1190  |(The Roadie)  |   Austin Better
http://www.fmp.com|  |  Business Bureau





Re: Secure spamd server

2009-02-04 Thread Michael Parker


On Feb 4, 2009, at 11:06 AM, Andre wrote:




On Wed, 4 Feb 2009, Matus UHLAR - fantomas wrote:


On 03.02.09 21:39, Andre wrote:
spamc is never called from Exim in this case, so the --ssl switch  
can't be
used. At least that is my understanding (maybe mis-understanding?)  
of the

situation.


Doesn't exim even have the option for running spamc? If so, that's  
serious

bug I'd say...


Indeed it does - after rcpt as a local transport. However, in that  
case

you can't reject mail at smtp time anymore. It is also slower in my
experience. For those reasons you'd normally use spam scanning at smtp
time - where exim connects directly to spamd without the spamc step.


In that case I'd say the bug is that the exim spamd protocol support  
doesn't allow ssl.


Michael



Re: Test order

2009-01-08 Thread Michael Parker


On Jan 7, 2009, at 5:10 AM, Benny Pedersen wrote:



On Tue, January 6, 2009 03:11, Matt Kettler wrote:


Check your .pre files to make sure the shortcircuit plugin is loaded
in one of them. (Note: loadplugin statements added to local.cf will
NOT work, they should be in the .pre files)


is this so in 3.3 svn ?
in 3.2.5 it works olso from local.cf maybe i am wroung, but i have
seen alot of plugins that do loadplugn in cf



loadplugin lines in cf files will work, there are some gotchas however.

If you wait to load a plugin in a cf, especially local.cf then it will  
not be loaded during cf parsing.  If you plugin does any sort of rule  
parsing then you've missed the boat.


Michael



Re: bayes SQL delays

2008-11-03 Thread Michael Parker


On Nov 2, 2008, at 12:55 PM, Micah Anderson wrote:



I have spamd setup to use bayes in a mysql database, works fine. I've
turned off auto-expiry and instead run a cronjob to expire in the  
middle

of the night (removes about 40k tokens on a run). I've made the DB
innoDB so it can handle locking better. I've got mysql-based user  
prefs
coming from the same database server, and that works (not everyone  
wants
bayes). Autolearning is working, I chew through a lot of mail every  
day,

in general everything seems fine.

Except that my spamd server is overloaded, so I need a second one.  
So I
set up another spamd instance, with the exact same configurations as  
the

first, fire it up and it immediately starts blocking on the bayes
work. Average scantimes go from 1-2 seconds up to 35+ and the max
children get eaten up by blocking on the bayes work to the point where
its pointless because too many processes are blocked. If I disable the
bayes_sql stuff in my local.cf, scantimes drop back to their expected
average of 1-2 seconds, but of course none of the BAYES tests will  
fire

and autolearning fails.

What gives?


Could be several things.

Its not clear in your description, but is the mysql server a separate  
server already? or was it local to the first spamd?  If it was you've  
now added network to the equation and that could introduce several  
issues, network latency, different behavior from mysql (ie hostname  
lookups) etc.


How often do you optimize the database?

Are you running with a global database? that can introduce some row  
level locking issues when accessed via multiple machines.


Which bayes storage module are you using? regular SQL? or the MySQL  
specific module?


It might be that you just need to perform some basic MySQL  
optimizations, since those are usually site specific it would be hard  
to offer advice.


Michael


Re: Merge of bases bayesians

2008-08-22 Thread Michael Parker


On Aug 22, 2008, at 5:36 AM, Eduardo Júnior wrote:


I have an e-mail server 1, which has spamassassin with a basic  
Bayesian. And I also have another e-mail server 2, which is another  
basic Bayesian.


Can I make a merge of the two bases without an overwrite?

Thus, my new basis would be a union of two new_base(A U B)



No, not really.  Just pick the one that has learned the most and use it.

Michael



Re: Re-injection

2008-08-05 Thread Michael Parker

On Aug 5, 2008, at 7:57 AM, LDB wrote:


Is it possible to re-inject a caught piece email that was
labeled HAM, in spamd or spamc and force it to learn it
as spam for bayes?




Yes.

First read up on the --allow-tell command line switch for spamd, if  
you're ok with the "risks" then start spamd up with that option.


Then you can use spamc --learntype= (ie spamc --learntype=spam)  
to send the message to spamd for learning.  It might also be necessary  
to add -u  to the spamc command as well.  More details  
available in the spamc man page.


Michael



Re: Install mysq for bayes

2008-07-15 Thread Michael Parker


On Jul 15, 2008, at 4:04 AM, Alex Woick wrote:


Paolo De Marco schrieb am 11.07.2008 11:17:

I want to migrate to mysql form my bayes.
I have installed perl modules, mysql and modify local.cf.
When i run amavisd debug i see this lines:
Jul 11 11:16:36 mail.ial.fvg.it /usr/local/sbin/amavisd[17564]:  
(!!)TROUBLE in pre_loop_hook: Undefined subroutine  
&DBD::mysql::db::_login called at /usr/lib/perl5/vendor_perl/5.8.0/ 
i386-linux-thread-multi/DBD/mysql.pm line 128.


You have DBI installed, but probably you are missing the DBD::mysql  
perl module. On my system (Fedora) it is contained in the perl-DBD- 
MySQL package.




Another guess is that the version of DBI that you have installed is  
not compatible with the version of DBD::mysql.


Check your version numbers and what they depend on.

Michael



Re: mysql AWL issue....

2008-07-08 Thread Michael Parker


On Jul 8, 2008, at 12:42 PM, Adam Harrison wrote:


I put the following in local.cf (passwords obscured):

bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn   DBI:mysql:spamassassin:sa-db.intelius.com:3306
bayes_sql_username  readwrite
bayes_sql_password  

auto_whitelist_factory  Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsnDBI:mysql:spamassassin:sa- 
db.intelius.com:3306

user_awl_sql_username   readwrite
user_awl_sql_password   


The bayes stuff works just fine, but the awl stuff still writes to  
the local disk instead of mysql. And it looks like it’s ttying to  
connect to the root account in mysql, as the logs say:


Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
sql-based connected to DBI:mysql:spamassassin:sa-db.intelius.com:3306
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
sql-based using username: root
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
sql-based get_addr_entry: no entry found [EMAIL PROTECTED] 
|ip=none
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
sql-based [EMAIL PROTECTED]|ip=none scores 0/0
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
AWL active, pre-score: 2.865, autolearn score: 2.865, mean: undef,  
IP: undef
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
sql-based finish: disconnected from DBI:mysql:spamassassin:sa- 
db.intelius.com:3306
Jul  7 15:57:25 smtp5.sea.intelius.com spamd[13024]: auto-whitelist:  
post auto-whitelist score: 2.865




What am I doing wrong, that SA would still be going to the local  
disk for the AWL stuff?




What about the above messages makes you think its writing to a local  
disk file?


Connection is root is because on init the code runs a sample message  
through to load all the libraries and its running as the root user.   
Its not trying to connect as root.


Michael




SpamAssassin is from rpm and is version  
spamassassin-3.2.4-1.el4.rf.  Perl is perl-5.8.8-2.el4s1. MySQL is  
5.0.18. And it’s running under Red Hat Enterprise Linux AS 4.6.


Thanks,
-Adam




Re: Short circuit priority doesnt seem to work

2008-06-26 Thread Michael Parker


On Jun 26, 2008, at 7:01 PM, Benny Pedersen wrote:



On Thu, June 26, 2008 23:09, Larry Nedry wrote:


Benny, you might want to read the docs:


docs needs updating, all test i have done is showing this is not  
working

so here



Hmmm then you are running a faulty or modified version I guess, here  
is the code:


  foreach my $priority (sort { $a <=> $b } keys %{$pms->{conf}- 
>{priorities}}) {


If you've got a repro of it not working that way, I invite you to file  
a bug with that repro.


Michael





Re: Short circuit priority doesnt seem to work

2008-06-26 Thread Michael Parker


On Jun 26, 2008, at 4:09 PM, Larry Nedry wrote:


On 6/26/08 at 7:05 PM +0200 Benny Pedersen wrote:
make priority positive not negative, default all have 0 to start  
with, and

10 would be tested before 0 :-)


And again on 6/26/08 at 10:06 PM +0200 Benny Pedersen wrote:

On Thu, June 26, 2008 21:17, Michael Parker wrote:

Negative numbers come before positive numbers.


nope

order is positive to negative

you might find it is correct by testing more :)


Benny, you might want to read the docs:
<http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html 
>


Or just believe the person that actually wrote the code.

Michael


Re: Short circuit priority doesnt seem to work

2008-06-26 Thread Michael Parker


On Jun 26, 2008, at 12:05 PM, Benny Pedersen wrote:



On Thu, June 26, 2008 17:13, ram wrote:

How do I enforce SA to wait for results negative short circuited  
rules

of higher priority  before shorcicuiting mail as spam due to positive
ones


make priority positive not negative, default all have 0 to start  
with, and

10 would be tested before 0 :-)



That is not correct.

Negative numbers come before positive numbers.

Michael



Re: SQL DB schema issue

2008-05-28 Thread Michael Parker


On May 28, 2008, at 10:38 AM, Rocco Scappatura wrote:



Hello,

I'm using SA with SQL support under Amavid-new. My DBMS is MySQL.

I 'm preparing one another Antispam server and I ve installed the  
latest

stable software available.

I ve dumped bayes DB (schema + data) from an already working machine  
and

I ve restore them on the new machine.



How did you do this dump?  Which tables did you get?




But when I try to start amavisd in debug mode I get the following
errors:

May 28 17:37:29.010 av8.stt.vir /usr/local/sbin/amavisd[17102]:
SpamAssassin debug facilities: info
bayes: database version 0 is different than we understand (3),  
aborting!

at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
bayes: database version 0 is different than we understand (3),  
aborting!

at /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
line 136.
May 28 17:37:30.155 av8.stt.vir /usr/local/sbin/amavisd[17102]:
(!!)TROUBLE in pre_loop_hook: check: no loaded plugin implements
'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.
Suicide () TROUBLE in pre_loop_hook: check: no loaded plugin  
implements

'check_main': cannot scan! at
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line
164.



The Check plugin is a worse problem and suggest something is really  
wonky with your install.






While the version specified in the database is really '3'.

What it could be the source of this error?




It looks for the version in the bayes_global_vars table, check to see  
what value is in there.


Michael


Thanks,

rocsca





Re: MySQL and Size Of bayes_expiry_max_db_size

2008-05-27 Thread Michael Parker


On May 27, 2008, at 4:22 PM, Larry Nedry wrote:


Greetings,

This weekend I created a MySQL db to store my bayes tokens.  It  
seems to be

working well but I'm a little puzzled by the default size of
bayes_expiry_max_db_size.  I understand that the default size is  
150,000

which seems very low as it took only one day to reach 100,000 tokens.

Was the default size set that low because of the performance of the  
default db?
Is it reasonable to set it to a much higher number considering that  
I am

using a SQL db?



You should adjust it for whatever works best for your user base and  
the resources you have available on your database.  The default value  
is best suited for single users so I wouldn't be surprised if it was  
too low.


Michael



Thanks for any help!

Nedry





Re: Persistent DB connections

2008-04-22 Thread Michael Parker


On Apr 22, 2008, at 5:31 AM, Christoph Petersen wrote:

He guys,

for my setup I use a MySQL DB as the store for bayes and AWL. Every  
process
is opening, querying and closing his own DB connection which results  
in

latency and is not necessary.


Really? Thats interesting, in all my tests MySQL has always done well,  
it is very cheap to recreate MySQL connections so the DBI Persistence  
plugin has never been a big win at all for MySQL.




I tries MyDBI (the result of the Summer of
Code 2007) but then I get strange errors in my log:

warn: spamd: DBD driver has not implemented the AutoCommit attribute  
at

/usr/local/lib/perl/5.8.8/DBI.pm line 689,  line 114.

warn: Use of uninitialized value in concatenation (.) or string at
usr/local/share/perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm line  
133,

 line 28.

warn: bayes: database version is different than we understand (3),  
aborting!
at usr/local/share/perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm  
line 136,

 line 28.

Is there another plugin which can achieve persistent db connections?  
Or some

clues to fix the abovementioned issues?


You say you tried the original plugin from here:

http://wiki.apache.org/spamassassin/DBIPlugin

and got the same error?

I have to ask, are you sure it was working without the plugin installed?

Please provide the following information:

1) The output of the following command:

spamassassin --debug=generic,diag --lint

2) Your local.cf contents, feel free to block out any passwords.

3) What version of DBD::mysql you are running.

Thanks
Michael




BR
Christoph Petersen





Re: Bayes DB growing without bound; expiry not working

2008-04-21 Thread Michael Parker


On Apr 21, 2008, at 8:40 AM, Chris St. Pierre wrote:

On Mon, 21 Apr 2008, Michael Parker wrote:


select * from bayes_vars;


...
2289 rows in set (0.00 sec)


What user do you run bayes under on your MXs?


I think you've found the issue.  We run as spamd.

# sa-learn -u spamd --dump magic
0.000  0  3  0  non-token data: bayes db  
version

0.000  01492123  0  non-token data: nspam
0.000  0 660634  0  non-token data: nham
0.000  0   73178711  0  non-token data: ntokens
0.000  0 1189775610  0  non-token data: oldest atime
0.000  0 1208785034  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal  
sync atime
0.000  0  0  0  non-token data: last expiry  
atime
0.000  0  0  0  non-token data: last expire  
atime delta
0.000  0  0  0  non-token data: last expire  
reduction count


That leads to two issues:

1.  I need to straighten things out and figure out why I've got a
strange mix of per-user and global data in my Bayes DB.  Whee.



You should use the bayes override username if you want global and then  
just sa-learn -u  clear everything else (PITA, I know).  I  
personally don't believe individual bayes dbs are an issue, if you've  
got the space and CPU on your database machine.  See below for some  
solutions.





2.  Does this mean that, if I use per-user Bayes, I have to run
expiration as each user individually?

Manual expiration was recommended to me a long time ago as a way to
increase database performance, but it seems like it may not be worth
it if I have to run N forced expirations, for potentially large values
of N.



This is true for DBM based bayes databases, but generally (with an  
exception I'll talk about in a second) MySQL based bayes expiration is  
very fast (just a few seconds).  I would go ahead and turn auto-expire  
on, after running a manual expire to clear out the current backlog.


One reason that expiration slows down is an unoptimized db.  I've  
found for my small uses if I run optimization every couple of weeks I  
get much better performance. It looks like you get a lot more traffic  
so I would recommend running it more often.  With frequent  
optimizations and auto-expire your database will stay in much better  
shape.


Michael



Thanks for your help.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University





Re: Bayes DB growing without bound; expiry not working

2008-04-21 Thread Michael Parker


On Apr 21, 2008, at 8:17 AM, Chris St. Pierre wrote:


Consequently, my database is growing, apparently without bound.

Any ideas how I can get expiry to work properly again?  (Hopefully
without completely dumping the database?)



select * from bayes_vars;

What user do you run bayes under on your MXs?

Michael



Re: libspamc.so and bayes

2008-02-07 Thread Michael Parker

On Feb 6, 2008, at 4:49 AM, Or Goshen wrote:
Is it possible to use libspamc.so to tell spamd that a message is  
either spam or ham ?

ie, imitate  "sa-learn --spam/--ham" using libspamc.so.

There dont seem to be any documentation about the library, all I  
could find are comments in the header file which weren't really  
helpful.




Yes, look at message_tell:

/* Process the message through the spamd tell command, making as many
 * connection attempts as are implied by the transport structure. To  
make
 * this do failover, more than one host is defined, but if there is  
only

 * one there, no failover is done.
 */
int message_tell(struct transport *tp, const char *username, int flags,
 struct message *m, int msg_class,
 unsigned int tellflags, unsigned int *didtellflags);


You can look for some example usage in the actual spamc command:

   −L learn type, −−learntype=type
   Send message to spamd for learning.  The "learn type" can  
be either
   spam, ham or forget.  The exitcode for spamc will be set  
to 5 if

   the message was learned, or 6 if it was already learned.

   Note that the "spamd" must run with the "−−allow 
−tell" option for

   this to work.

   −C report type, −−reporttype=type
   Report or revoke a message to one of the configured  
collaborative
   filtering databases.  The "report type" can be either  
report or

   revoke.

   Note that the "spamd" must run with the "−−allow 
−tell" option for

   this to work.


Michael

Re: Expiry problem

2008-01-24 Thread Michael Parker


On Jan 23, 2008, at 9:54 PM, Steven Stern wrote:



It's finally started to remove tokens, so I think I'm OK. We use SQL
bayes, so it was an easy matter to use

~  delete from bayes_token where atime > UNIX_TIMESTAMP();

to clean up the stuff from the future.




But now your bayes_vars table is broken/off.  You might want to update  
those counts as well.


Michael


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-23 Thread Michael Parker


On Jan 23, 2008, at 6:37 AM, Rubin Bennett wrote:



Spamd output below:[EMAIL PROTECTED] ~]# spamd -q -D
[12373] dbg: logger: adding facilities: all
[12373] dbg: logger: logging level is DBG


Can you run this again and this time pass 1-2 msgs through just like  
you would normally, instead of just the default prime-the-pump  
message.  Also, please remind me of you spamd startup options and  
maybe even attach your local.cf (or where ever you're adding the sql  
config items) file for good measure.


Michael


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Michael Parker


On Jan 22, 2008, at 12:17 PM, Rubin Bennett wrote:



On Tue, 2008-01-22 at 10:45 -0600, Michael Parker wrote:

On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:


WTF am I doing wrong?!


Not including debug logs in your message.

User prefs does not work with spamassassin, so you won't see anything
there, but you should be seeing something for Bayes SQL and AWL SQL  
if

they are configured correctly.


What do you mean?!  Isn't that what the user_scores_dsn is all about?!



The spamassassin script.  User prefs only works when you run via  
spamd.  But lets look at the debug output:




[31490] dbg: bayes: using username: root
[31490] dbg: bayes: database connection established
[31490] dbg: bayes: found bayes db version 3
[31490] dbg: bayes: Using userid: 1


Ok, this tells me that Bayes SQL looks to be running just fine.  If  
you read sql/README.bayes it tells you what to look for to test if  
things are working correctly.




[31490] dbg: bayes: corpus size: nspam = 2106, nham = 19051
[31490] dbg: bayes: tok_get_all: token count: 20
[31490] dbg: bayes: score = 0.472224419305046
[31490] dbg: bayes: DB expiry: tokens in DB: 133258, Expiry max size:
15, Oldest atime: 1193647841, Newest atime: 1201025739, Last  
expire:

1195029791, Current time: 1201025739


It even looks like you've got some data in there.


As to the user_prefs in SQL stuff, that will require spamd -D output.   
Again, read sql/README for details on testing things, maybe you're  
just not grepping for the right string.  When run run spamd under  
debug it will show you the exact sql query it is sending.  You can run  
that query by hand to see if its giving back meaningful data.  You  
might also turn on query logging on my MySQL server (assuming you have  
the capability) and see what it says spamd is sending.


Michael


Re: Spamd and MySQL userprefs/ AWL/ Bayes

2008-01-22 Thread Michael Parker

On Jan 22, 2008, at 10:12 AM, Rubin Bennett wrote:


WTF am I doing wrong?!


Not including debug logs in your message.

User prefs does not work with spamassassin, so you won't see anything  
there, but you should be seeing something for Bayes SQL and AWL SQL if  
they are configured correctly.


Try running spamassassin -D --lint and sending the output to the list,  
only then can folks really help you.


Michael



Re: sa 32x-branch 'make test' fails @ "t/spamc_optL.t" (among others ...) on freebsd

2007-12-31 Thread Michael Parker


On Dec 31, 2007, at 10:23 AM, snowcrash+sa wrote:



but bear in mind that it will probably only get attention from  
other jail

users


heh. understood. and, expected.

alas, i know it's wasted breath to argue that the prevalence of SA-(&
everything else, for that matter)-in-jails/VMs is only going to
increase, and that this will not be an atypical use-case ... but, for
now, NIH-syndrome, i s'pose ;-)



Not wasted breath as long as you'll accept:

Patches Welcome!

as a response :)

Michael


Re: Mondo bayes_toks - millions of entries

2007-11-30 Thread Michael Parker

On Nov 30, 2007, at 1:56 PM, Wes wrote:

Well, spamd is apparently doing things far more efficiently than "sa- 
learn
--restore".  Tokens are loading into the DB much faster than the  
restore,

and postmaster is hardly ever a blip in 'top' (at least so far).  When
running the restore, postmaster was sitting up about 60-80% CPU  
constantly.


Learning normally can take advantage of inserting/updating tokens in  
batches.  When doing a restore it has to insert each token separately.


BTW, while the best effort was put into the postgresql support, I'm  
sure it could use help so if anyone wants to hack on it and submit  
patches I'm certain that the developers would be more than happy to  
take a look.


Michael


Re: SQL-based AWL and Bayes not working with 3.2.3

2007-11-27 Thread Michael Parker

On Nov 27, 2007, at 10:16 AM, Rene Caspari wrote:

In my case it is a bug :-)

Because I don't have any chance to get user specified bayes db working
which come from a SQL database.



Its actually a behavior change, at least for me.  How are you running  
spamd?  If you are running with -q or --sql-config, the you will need  
to also run with -u .  A bug was recently opened with a  
patch but all the patch does is make sure that you supply the -u  
before it will run correctly.


Here is the bug for reference:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5728

Michael



Re: SQL-based AWL and Bayes not working with 3.2.3

2007-11-19 Thread Michael Parker

On Nov 19, 2007, at 8:57 AM, Andrew Hearn (AAISP) wrote:


Rene Caspari wrote:

Hi,

I'm using spamassassing 3.2.3 with userspecific rules from an SQL
database:

/etc/spamassassin/local.cf:
user_scores_dsn DBI:mysql:spamassassin:localhost
[...]
bayes_store_module  Mail::SpamAssassin::BayesStore::SQL
[...]
auto_whitelist_factory  Mail::SpamAssassin::SQLBasedAddrList

spamc is called by procmail.
/etc/procmailrc:
:0fw
* < 256000
| /usr/bin/spamc -U /var/run/spamd.sock -u $USER

(where $USER is created by Postfix:
/usr/bin/procmail -t -m USER=${recipient} SENDER=${sender} /etc/ 
procmailrc)


Since I updated to 3.2.3 (Debian Volatile) I get the error message in
/var/log/mail.log:
[...] spamd: still running as root: user not specified with -u, not  
found, or set to root, falling back to nobody


After this, spamassassin uses the userspecific SQL tables with the  
user
"nobody" not the specific user, who is the recepient of the  
scanning mail.


Do you have an idea how I can resolve this?


I think I have the same problem too, on one of our tests servers. this
is one I'm running 3.2.3 on, and using the same config from our other
3.1.7 machines which are happy with Bayes...

User preference is being used, as I can tell that as the required  
score

is being set correctly from the preferences.



I've also recently seen this issue.  I believe it relates to the  
recent setuid changes in spamd.  What I'm not sure of is if its a BUG  
or just a behavior change that we need to account for.  If you run  
spamd in debug mode you can see it handle the passed in user and then  
force the drop back to user nobody.


Michael


Re: Book Recommendation

2007-10-26 Thread Michael Parker
cpayne wrote:
> Guys,
> 
> I am looking for books on Spamassassin. And I am wanting to know what
> you guys recommend?
> 

http://spamassassin.apache.org/
http://wiki.apache.org/spamassassin/

Seriously.

There are a couple of books on SA, but they go out of date very quickly.
 You're much better off using the home page documentation and the wiki.
 Also, you've already found this list, use it.

Michael


Re: change mysql username in bayes_var table

2007-10-14 Thread Michael Parker
Rob Mangiafico wrote:
> We're converting a server from per user mysql bayes to sitewide bayes 
> using the "bayes_sql_override_username USERNAME" command. We want to use 
> the data for one username in the mysql db already that has quite a nice 
> buildup of trained ham and spam.
> 
> Can we simply change the username in the mysql db in the bayes_var table 
> (to "root" for example), and then set:
> bayes_sql_override_username root
> ?
> 
> Then we can run the "sa-learn --force-expire --sync" as the root user for 
> easy nightly updating.
> 
> Would the above process work? Thanks.
> 

Yes, it should.  I suggest you shut everything down, make the sql change
and then the config change, then start things back up.

Michael






Re: Learn Ham/Spam to more than Bayes DB

2007-10-14 Thread Michael Parker
Magnus Anderson wrote:
> Hi,
> 
> I have a script that runs every night with sa-learn to learn new ham/spam
> messages for every user.
> I do this with running these commands
> 
> /usr/bin/sa-learn --ham --no-sync -u {$_array['sa_user']} {$_dir['inbox']}
> /usr/bin/sa-learn --ham --no-sync -u {$_array['sa_user']} {$_dir['ham']}
> /usr/bin/sa-learn --spam --no-sync -u {$_array['sa_user']} {$_dir['spam']}
> 
> These mail boxes are in mbox format. I also rely on -u for overriding the
> username, as I use Bayes and AWL in MySQL for eatch username. 
> 
> The problem now is though that I want to report ham/spam to more than just
> my Bayes DB if users want to do that. I want to report spam to
> Pyzor,Razor,Dcc and SpamCop.
> 
> But sa-learn doesn't support this. Does anyone have an idea on how to make
> this work?

spamassassin -r does what you want, but there is no way to specify the
username on the command line.  You could write a script using the SA API.

Michael


Re: Bayes only if -u specified?

2007-10-12 Thread Michael Parker
Jason Frisvold wrote:
> Hi all,
> 
> Quick question.  Is it possible to set up spamassassin to use Bayes
> only if the -u option is passed via spamc?  I'm using simscan to call
> spamassassin and if the user is not specified, it falls back to the
> nobody account.  The bayesian database fills up with tons of tokens
> that I believe are hurting, rather than helping, the identification of
> spam.
> 
> Thanks,
> 

Two options, since you're using spamc/spamd.

1) Put user configs into SQL and for user nobody set use_bayes 0, you
might get similar results if you give user nobody an actual home
directory and a user_prefs file, but I've never tried that.

2) There is a plugin hook that you can use that allows you to authorize
bayes for specific users.  It would be pretty trivial to write an
everyone except user nobody plugin to do what you want.  Here is a
sample plugin you could build off of:
http://wiki.apache.org/spamassassin/AuthzUserPlugin

Michael


Re: Bayes innodb problems

2007-09-26 Thread Michael Parker
micah wrote:
> On Wed, 26 Sep 2007 17:54:05 -0700, John D. Hardin wrote:
> 
>> On Wed, 26 Sep 2007, Micah Anderson wrote:
>>
>>> SELECT count(*)
>>>FROM bayes_token
>>>   WHERE id = '4'
>>> AND ('1190846660' - atime) > '345600';
>> Who the hell wrote *that* query? Is MySQL smart enough to rearrange that
>> equation to give an indexable comparison?
> 
> That comes from /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pl line 
> 243. It seems to calculate the expire delta, but in a way that can't use 
> an index.
> 
> Maybe that query should be changed from:
> 
> AND (? - atime) > ?" 
> 
> to:
> 
> AND atime < ? + ?"
> 

Can someone please open up a Bugzilla bug for this so it can be tracked?

Thanks
Michael


Re: bayes_seen = 256GB

2007-09-19 Thread Michael Parker
Dave Koontz wrote:
> Theo and all.  I know this topic comes up on occasion, but I am not sure
> I've ever seen an explanation as to why the bayes_seen file is not auto
> pruned along with the bayes db file.  Since tokens expire in the main DB
> file, what is the purpose of having a seen file to unlearn tokens which
> may have long ago been purged?   IMO, it would seem logical to also
> purge the seen file at some sort of cycle so it can't grow so
> excessively large.
> 

In order to expire from bayes_seen you have to know that there are no
longer any tokens from a given msg in the bayes_token database.  This is
a hard problem, mapping tokens to msgs, so it wasn't done.  Likewise no
one ever did anything about expiring the bayes_seen entries.

Sounds like a good project, there might even be a bugzilla enhancement
opened already.

Patches are welcome.

Michael



> Theo Van Dinter wrote:
>> On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote:
>>   
 The file bayes_seen has grown in size to 256GB!  (274992939008)
 How do I cap the size limit of this file? I want to have it not grow larger
 then say 800mb at the most!
   
>>> You need to expire old bayes tokens. The limit is set not as a size, but as
>>> 
>> Expiring bayes tokens does nothing to the bayes_seen file.  There is no 
>> expiry
>> for bayes_seen.
>>
>> If the seen file is bigger than you'd like, I'd just rm the file.
>>
>>   
> 



Re: SQL error: Deadlock found when trying to get lock

2007-09-19 Thread Michael Parker
pennywise wrote:
> Hello together!
> 
> I ´ve got following problem with my spamassassin which I couldn´t solve.
> When I use
> 
>  su vscan -c '/usr/local/bin/sa-learn -D --force-expire --sync'
> 
> I got this error message:
> 
> [72597] dbg: bayes: token_expiration: SQL error: Deadlock found when trying
> to get lock; try restarting transaction

Generally these are harmless, but its possible that the code isn't
handling the "try restarting transaction" bits correctly.

Can you please file a bug?  http://issues.apache.org/SpamAssassin/

Thanks
Michael

> expired old bayes database entries in 161 seconds
> 26158500 entries kept, 0 deleted
> token frequency: 1-occurrence tokens: 2.80%
> token frequency: less than 8 occurrences: 0.91%
> [72597] dbg: bayes: expiry completed
> 
> This is my Server:
> FreeBSD 6.0
> mysql-server-4.1.16 (innoDB)
> amavisd-new-2.5.2,1
> p5-Mail-SpamAssassin-3.2.3
> 
> Can anybody help me? Thanks for your help.
> 
> Best regards,
> Pennywise
> 
> 
> 
> 
> 
> 




Re: List of 600,000 IP addresses of virus infected computers

2007-09-10 Thread Michael Parker
The users lists is not really an appropriate place to advertise your
spam/virus filtering business.

Please do not feed the trolls.


Thanks
Michael


Re: Who wants my spam - seriously!

2007-09-06 Thread Michael Parker
Please do not feed the trolls.

Michael


Re: header /^\Q...\E$/m

2007-09-01 Thread Michael Parker
[EMAIL PROTECTED] wrote:
> < so whenever one uses a ^ or $ in a pattern, one is almost obliged to
> < append a /m flag, otherwise one risks being at a mercy of malicious
> < senders... Depending on a situation, this can be a security risk.
> 
> Sure wish Mail::SpamAssassin::Conf would mention all this where it
> discusses headers. Also it should mention loss of \Q\E.
> 

Apache SpamAssassin is an open source project.  The source and
documentation are open and available to all.  This means that you are
welcome to update any documentation and submit patches, via bugzilla,
for submission to the project.

or "Patches Welcome"

Michael


Re: spamd keeps running at 99% CPU until i kill the process

2007-08-28 Thread Michael Parker
Richard Hobbs wrote:
> Hello,
> 
> Could the size of "bayes_seen" and "bayes_toks" be causing this timeout?
> 

No, those aren't really that big, but it does look like you have an
expiration problem.

To solve your immediate problem you could just turn off bayes, that will
get mail flowing again and then you can address your real problem.

I'd guess that the spike in CPU you are seeing is due to bayes
expiration running and then getting killed off.  Expiration can be very
IO intensive and it locks the database while running which can cause a
backup depending on you setup.  To make matters worse it looks like
you've got something timing out and killing off the expiration so its
not allowed to complete.

Long term, I'd suggest turning off auto expiration and then running
sa-learn --force-expire via a cronjob.  I believe there is good
information on setting this up on the wiki.

You can remove any *.expire* files that are older than 5 minutes, they
are left over from previously timed out expiration attempts.

Like I said, check out the wiki, you should find more information there
on the problem and possible solutions.

Michael
> ==
> mail:/home/spamcheck/.spamassassin# ls -l
> total 101060
> -rw---  1 spamcheck spamcheck  2637824 2007-08-28 12:42 auto-whitelist
> -rw---  1 spamcheck spamcheck6 2007-02-27 16:35
> auto-whitelist.mutex
> -rw---  1 spamcheck spamcheck   56 2007-08-28 12:52 bayes.lock
> -rw---  1 spamcheck spamcheck6 2007-02-27 16:33 bayes.mutex
> -rw---  1 spamcheck spamcheck 10530816 2007-08-28 11:11 bayes_seen
> -rw---  1 spamcheck spamcheck 83738624 2007-08-28 12:52 bayes_toks
> -rw---  1 spamcheck spamcheck  1298432 2007-02-27 18:14
> bayes_toks.expire1698
> -rw---  1 spamcheck spamcheck  1327104 2007-02-28 02:17
> bayes_toks.expire17474
> -rw---  1 spamcheck spamcheck  2473984 2007-02-28 02:31
> bayes_toks.expire17865
> -rw---  1 spamcheck spamcheck  1302528 2007-02-28 02:48
> bayes_toks.expire17866
> -rw---  1 spamcheck spamcheck  1335296 2007-02-28 03:17
> bayes_toks.expire18715
> -rw---  1 spamcheck spamcheck  2572288 2007-02-28 03:37
> bayes_toks.expire19618
> -rw---  1 spamcheck spamcheck   655360 2007-02-28 09:19
> bayes_toks.expire28928
> -rw---  1 spamcheck spamcheck  1302528 2007-08-28 10:14
> bayes_toks.expire3058
> -rw---  1 spamcheck spamcheck  1302528 2007-08-28 10:34
> bayes_toks.expire3059
> -rw---  1 spamcheck spamcheck  1298432 2007-02-27 17:33
> bayes_toks.expire31684
> -rw---  1 spamcheck spamcheck  1302528 2007-02-27 18:42
> bayes_toks.expire31685
> -rw---  1 spamcheck spamcheck  5017600 2007-08-28 11:08
> bayes_toks.expire4625
> -rw---  1 spamcheck spamcheck  5013504 2007-08-28 12:35
> bayes_toks.expire7150
> -rw---  1 spamcheck spamcheck  5013504 2007-08-28 12:52
> bayes_toks.expire7925
> -rw-r--r--  1 spamcheck spamcheck 1175 2005-07-20 14:23 user_prefs
> mail:/home/spamcheck/.spamassassin#
> ==
> 
> If so, what can i do about this?
> 
> Thanks again,
> Richard.
> 
> 
> Richard Hobbs wrote:
>> Hello,
>>
>> Mark Martinec wrote:
>>> Richard,
>>>
 To add information to this problem, it appears that spamd does
 eventually give up after 5 minutes
>>> Capture a message causing touble from a MTA queue,
>>> and feed it to a command line spamassassin with -t -D options.
>> I would love to do this, but it's not the same messages all the time -
>> in a batch of identical messages, some get through and others cause the
>> hanging. It seems to be completely random.
>>
>> Do you have any other ideas?
>>
>> Thanks again,
>> Richard.
>>
> 



Re: Really Stupid Question: Plugins

2007-07-18 Thread Michael Parker
Skip Brott wrote:
> I haven't yet had to implement any pdf plugins, but I am looking to do so.
> I am running SA 3.1.9 and perl 5.8.8.  From what I can see, my plugins are
> here:
> 
> /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/
> 
> And there is no related folder for 5.8.8
> 
> Is that the location where I want to install the plugin?
> 

I usually recommend that people place third party plugins into their
site or local rules directory (ie /etc/mail/spamassassin) and then
specify that path in the loadplugin line.

For instance, if you download MyPlugin.pm from the wiki, copy it to
/etc/mail/spamassassin.  Create a myplugin.pre file and put the following:

loadplugin MyPlugin /etc/mail/spamassassin/MyPlugin.pm


In the above config line the name of the plugin is actually the perl
package name, so if its Mail::SpamAssassin::Plugin::MyPlugin then the
config line will look like:

loadplugin Mail::SpamAssassin::Plugin::MyPlugin
/etc/mail/spamassassin/MyPlugin.pm

Mucking around with the site_perl lib directories by hand is asking for
trouble.

Michael


Re: PDFText Plugin for PDF file scoring - PDFText2.pm for ver 3.2

2007-07-16 Thread Michael Parker
Theo Van Dinter wrote:
> 
> IMO, if people find this a useful enough feature of 3.2, it's a relatively
> trivial change in the code as I recall, so a bugzilla request to backport
> may get somewhere for a future 3.1 release.
> 

I would +1 a backport.

Michael


Re: No Bayes!!

2007-06-28 Thread Michael Parker
Lindsay Haisley wrote:
> On Thu, 2007-06-28 at 15:43 -0400, Theo Van Dinter wrote:
>> On Thu, Jun 28, 2007 at 02:27:36PM -0500, Lindsay Haisley wrote:
>>> So what's the best fix for this?  Should one just freeze SA at an
>>> earlier version on a production server until this is fixed upstream?  Is
>>> upstream aware of the problem and working on a fix for it?
>> You need to debug your installation and figure out what the problem is.  
>> Bayes
>> works fine in 3.2.  
> 
> Obvously, for some of us, it doesn't.  I can take the time to determine
> the conditions that cause the failure, but I don't have a lot of time to
> work on debugging this kind if thing if my installation works fine with
> an earlier version of SA.  If the developers upstream are aware of the
> problem and working on it, then any debugging I might do would very
> likely be a waste of my time - hence my question.

I can't recall a bug open for anything like this.  Please visit
http://issues.apache.org/SpamAssassin/ and file a complete bug report.
Please describe the exact problem you are seeing as well as full debug
output.  A random thread on the users list won't necessarily get
developers attention.

The developers are not aware of such a problem, best bet is to make them
aware.

I myself have been using Bayes SQL longer than anyone and have had no
problems recently upgrading from 3.1.8 to 3.2.  Also, the Bayes code has
been very stable, with little to no changes over the last few releases,
especially in the storage code, so its likely a config or environment issue.

Without proper debugging it will be hard to tell what exactly is the cause.


Michael


Re: per-user rules from mysql

2007-05-08 Thread Michael Parker
Duane Hill wrote:
> 
>   header L_TO_ME ToCc =~ /[EMAIL PROTECTED]/
>   describe L_TO_ME Email addressed to me
>   score L_TO_ME 0.010
> 

You can't do rules with SQL user prefs, not even with allow_user_rules.

Only non-admin config options are allowed.

Michael


Re: Problem upgrading from 3.1.8 to 3.1.20, check.pm

2007-05-07 Thread Michael Parker
[EMAIL PROTECTED] wrote:
> Hi,
> 
>   I've been upgrading several stable servers running 3.1.8 for months
> without any issues to 3.1.20, and got a problem in one of them. When
> trying to restart spamd, I get this:
> 
> @4000463ee4f622539324 [5532] error: check: no loaded plugin
> implements 'check_main': cannot scan! at
> /usr/local/share/perl/5.8.4/Mail/SpamAssassin/PerMsgStatus.pm line 164.
> @4000463ee4f622543f04 check: no loaded plugin implements
> 'check_main': cannot scan! at
> /usr/local/share/perl/5.8.4/Mail/SpamAssassin/PerMsgStatus.pm line 164.
> 
> The file v320.pre has its corresponding:
> 
> loadplugin Mail::SpamAssassin::Plugin::Check
> 
> And the Check.pm file is at
> /usr/local/share/perl/5.8.4/Mail/SpamAssassin/Plugin/ with the rest of
> plugins, being worldwide read accesible.
> 
> Is this some kind of bug or am I doing something wrong (or maybe I'm
> missing something).
> 

I doubt there is some kind of bug, otherwise a lot more people would be
having an issue.  What does spamassassin -D --lint say?  Are you sure
its finding the v320.pre file?  Feel free to reply with the output.

Michael


Re: Invalid use of \\ in string literal from postgresql

2007-05-04 Thread Michael Parker
Graham Murray wrote:
> I am using spamassassin 3.2.0 and Postgresql 8.2.4 for bayes and awl.
> 
> I am seeing several messages from Postgresql like the following
> 
> spamd[18408]: WARNING: nonstandard use of \\ in a string literal
> spamd[18408]: LINE 1: select 
> put_tokens(1,'{"003272260274052"...
> spamd[18408]:  ^
> spamd[18408]: HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
> 
> In both /var/log/mail (where these are pasted from) and the postgresql logs.
> 

This is due to the way Postgresql is handling strings in later versions.
 You can see some of the discussion in this bug:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5299

I would welcome patches from anyone who would like to provide a fix for
this.  Our Postgresql support is geared towards 7.X and some of the
advances in 8.X causes issues with a couple of things.  Sadly I don't
have enough time to maintain the support at the moment.

Michael


Re: ANNOUNCE: Apache SpamAssassin 3.2.0 available

2007-05-04 Thread Michael Parker
Anders Norrbring wrote:
> 
> I just ran into a big problem..
> 
> [25735] warn: bayes: database version is different than we understand
> (3), aborting! at
> /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/BayesStore/SQL.pm
> line 136

Double check your config for your database, make sure it is correct.

Possibly you've wiped out some portion of your database?  This wouldn't
be anything that SpamAssassin would have done.

The Bayes code received almost zero updates between 3.1 and 3.2 so I
can't imagine its anything like that.

Michael




Re: problem with Mail::SpamAssassin::PerMsgStatus - SA 3.2.0

2007-05-04 Thread Michael Parker
Maciej Friedel wrote:
> I can't make sa-compile because
> 
> [EMAIL PROTECTED]:~# spamassassin --lint
> [23577] warn: Couldn't get Connecting IP header X-SA-Exim-Connect-IP for
> message <[EMAIL PROTECTED]>, skipping greylisting call 
> [23577] warn: rules: failed to run CG_FUJI_JPG test, skipping:
> [23577] warn:  (Can't locate object method "image_name_regex" via
> package "Mail::SpamAssassin::PerMsgStatus" at (eval 2376) line 765.
> [23577] warn: )
> [23577] warn: rules: failed to run CG_DOUBLEDOT_GIF test, skipping:
> [23577] warn:  (Can't locate object method "image_name_regex" via
> package "Mail::SpamAssassin::PerMsgStatus" at (eval 2376) line 892.
> [23577] warn: )
> [23577] warn: rules: failed to run CG_SONY_JPG test, skipping:
> [23577] warn:  (Can't locate object method "image_name_regex" via
> package "Mail::SpamAssassin::PerMsgStatus" at (eval 2376) line 1327.
> [23577] warn: )
> [23577] warn: rules: failed to run CG_CANON_JPG test, skipping:
> [23577] warn:  (Can't locate object method "image_name_regex" via
> package "Mail::SpamAssassin::PerMsgStatus" at (eval 2376) line 2224.
> [23577] warn: )
> [23577] warn: lint: 4 issues detected, please rerun with debug enabled
> for more information


I saw mention that a newer ImageInfo plugin had this eval.  I'm guessing
you're running 3.2.0 with the older ImageInfo plugin but still have
these newer eval calls in a .cf file.  Either a) update ImageInfo in
your install or b) remove the .cf file with the bad eval calls.

Michael



> 
> cpan[1]> m /Mail::SpamAssassin::PerMsgStatus/
> CPAN: Storable loaded ok (v2.16)
> Going to read /root/.cpan/Metadata
>   Database was generated on Fri, 04 May 2007 03:10:10 GMT
> Module id = Mail::SpamAssassin::PerMsgStatus
> CPAN_USERID  JMASON (Justin Mason <[EMAIL PROTECTED]>)
> CPAN_VERSION undef
> CPAN_FILEJ/JM/JMASON/Mail-SpamAssassin-3.2.0.tar.gz
> UPLOAD_DATE  2007-05-02
> MANPAGE  Mail::SpamAssassin::PerMsgStatus - per-message status
> (spam or not-spam)INST_FILE   
> /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm   
> INST_VERSION undef
> 
> i using FuzzyOCR and sa-exim for greylisting 
> 
> Maciek
> 



Re: Question on use of SpamCop plugin

2007-04-19 Thread Michael Parker
Steven W. Orr wrote:
> What I've currently been using is this script:
> 
> #! /bin/bash
> exec tee >(mail [EMAIL PROTECTED]) | sa-learn --spam
> 
> Is there an advantage to using -r over what I have? (something like)
> exec tee >(spamassassin -r) | sa-learn 
> 

-r will also perform the sa-learn portion for you so no need to call it
separately.

Michael


Re: Mass-Check Hangs

2007-04-03 Thread Michael Parker
There is a bug in the 3.1 ArchiveIterator code that causes things to
hang on msgs that are too large or too small.  This got fixed in the 3.2
code, probably needs to be back ported.  Feel free to open up a bug.

Thanks
Michael

Larry Nedry wrote:
> Hi All,
> 
> I'm trying to use mass-check to test the accuracy of a plugin that I'm
> developing.  If I run mass-check without the -j option (single process) it
> takes a few hours for it to finish a corpus of about 60,000 emails.  If I
> use the --net option it could a day or two to complete.  Of course if I run
> it with the -j option it is much faster but almost always mass-check will
> hang at a seemingly random place.  I've seen it hang at less than 5%
> complete and a few times it got as far as 98% complete.  And it doesn't
> matter if -j=2 or -j=48, it still hangs.
> 
> Once it hangs I can let it sit for hours without seeing any network, disk
> or CPU activity.  I still have plenty of free memory so swapping is not the
> issue.
> 
> Are others running into this problem?  Is this a bug in mass-check?  Is
> there a newer (fixed) version that will work with SA 3.1.18?  Or am I
> missing something important?
> 
> My setup:
> Mac Pro Quad Xeon 3.0 Ghz
> Fedora Core 4 or Mac OS X 10.4.8 (same results)
> 5 GB RAM
> SpamAssassin 3.1.18
> 
> Directory layout:
> SA3.1.18/rules/
> SA3.1.18/masses/
> SA3.1.18/masses/ham/  (corpora)
> SA3.1.18/masses/spam/ (corpora)
> 
> My Command line:
> # ./mass-check --progress --noisy -c=../rules spam:mbox:./spam ham:mbox:./ham
> 
> I've seen the same problem running under both Fedora Code 4 and Mac OS X
> 10.4.8.
> 
> I'm currently using just the default rules that are in the ../rules folder.
> 
> What is the purpose of the mass_prefs file?
> Am I supposed to edit the mass-check.cf file?
> 
> Thanks in advance for any help!
> 
> Larry
> 



Last Reminder: Google Summer of Code Applications

2007-03-25 Thread Michael Parker
One last reminder for students.

The Google Summer of Code application deadline has been extended to
March 26th 5pm PDT.

Get your proposals/applications in NOW if you would like to participate.
 $4500 could be yours as well as an opportunity to work on SpamAssassin
this summer.

Here is the Apache GSoC wiki page:
http://wiki.apache.org/general/SummerOfCode2007

It already has several ideas for possible projects, but don't feel
limited by our list, make up your own proposal if you would like.

Thanks
Michael Parker


Students: Get Paid to Hack on SpamAssassin

2007-03-20 Thread Michael Parker
Howdy,

Just a reminder, the Apache Software Foundation (which SpamAssassin is a
 part of) is participating in the in Google Summer of Code again this year.

That means that you can get paid, up to $4500, this summer for working
on SpamAssassin.

The deadline is quickly approaching, you have until *March 24th* to sign
up and submit your proposals.

You can find instructions for getting signed up along with a list of
possible projects here (just search for spamassassin):

http://wiki.apache.org/general/SummerOfCode2007

That is by no means an exhaustive list so if you have other ideas or
know of something from here:

http://wiki.apache.org/spamassassin/WeLoveVolunteers

that you would like to work on, feel free to add it to the list and
submit an application.

Thanks
Michael Parker


Google Summer of Code 2007 - Students Wanted

2007-03-16 Thread Michael Parker
Howdy,

The time of year for Google Summer of Code has already arrived and once
again the Apache Software Foundation is taking part.

We are currently looking for students who wish to work on SpamAssassin
related projects over the summer.

You have until *March 24th* to sign up and submit an application.  Work
on the project will take place from May28th through August 20th.

You can find a list of possible projects here (just search for
spamassassin):

http://wiki.apache.org/general/SummerOfCode2007

That is by no means an exhaustive list so if you have other ideas or
know of something from here:

http://wiki.apache.org/spamassassin/WeLoveVolunteers

that you would like to work on, feel free to add it to the list and
submit an application.

Last year we were able to take on several projects, its a nice way to
earn 4500 USD over the summer.

Thanks
Michael Parker


Re: Undefined subroutine &Mail::SpamAssassin::Plugin::DBI::dbi

2007-02-24 Thread Michael Parker
Michael Monnerie wrote:
>> Either a) you have something goofed up there or b) something is
>> goofed in how we setup the INC path for plugins.
> 
> Something must have changed that breaks DBIPlugin, because at 3.1.7 
> I don't have that error.
> 

Please file a bug in Bugzilla.  It might be something with the plugin
but I suspect something with how we are figuring out the INC path.

Thanks
Michael


Re: Undefined subroutine &Mail::SpamAssassin::Plugin::DBI::dbi

2007-02-19 Thread Michael Parker
Chris wrote:
> On Monday 19 February 2007 6:06 pm, Theo Van Dinter wrote:
>> On Mon, Feb 19, 2007 at 05:50:27PM -0600, Chris wrote:
>>> This was the output of my sa-update cronjob this morning:
>>>
>>> Undefined subroutine &Mail::SpamAssassin::Plugin::DBI::dbi called
>>> at /etc/mail/spamassassin/DBI.pm line 162.
>>>
>>> I take it this has to do with the new option --allowplugins?
>> Nope.  It looks like your setup is just messed up.  For instance, what is
>> /etc/mail/spamassassin/DBI.pm and what are you trying to do with it?  :)
> 
> Uh, you're right, I have no idea why I installed it, I don't even remember 
> when. I know thats kind of a lame answer, but its the truth.
> 


Persistent DBI connections.


What does grep DBI *.pre say?

Either a) you have something goofed up there or b) something is goofed
in how we setup the INC path for plugins.

Michael


Re: Bayes db size....

2007-02-17 Thread Michael Parker
Dave Koontz wrote:
> I am sure this has been asked numerous times before, but what is the logic
> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
> have been removed from the DB there is little to no use for 'unlearning' any
> associated messages.  Besides on a busy system, this seen file gets large
> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
> 

Patches welcome.

Michael


> 
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 16, 2007 7:19 PM
> To: spam mailling list
> Subject: Re: Bayes db size
> 
> On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
>> So you're saying that right now seen isn't capped like tokens right?
> 
> seen has no max size nor expiry features.
> 
> --
> Randomly Selected Tagline:
> "Like any French restaurant in America, it was overpriced, noisy, moody,
> and would put you in mortal danger if you had an accident with anything
> larger than a croissant." - Unknown about the Renault LeCar
> 
> 



Re: Export and append Bayes DB

2007-02-16 Thread Michael Parker
Sam Przyswa wrote:
> Hi,
> 
> Is it possible to export a Bayes DB from a server and then append (not
> restore) it to others servers ?
> 

No, you generally can't combine two bayes databases that way.  Best bet
is to pick the most complete one and use it.

For more details see a really long post on the users mailing list from
me awhile back.

Michael


Re: Doubt with user_scores_sql_custom_query

2007-01-30 Thread Michael Parker
Kim Christensen wrote:
> * Jorge Cardona <[EMAIL PROTECTED]> [2007-01-29 23:48:52 -0500]:
> 
>> Hi.
>> I got a question about this parameter, the spamassassin documentation
>> tells this:
>>
>> 1) Current default query:
>>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
>> OR username = '@GLOBAL' ORDER BY username ASC
>>
>> 2) Use global and then domain level defaults:
>>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
>> OR username = '@GLOBAL' OR username = '@~'||_DOMAIN_ ORDER BY username
>> ASC
>>
>> 3) Maybe global prefs should override user prefs:
>>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
>> OR username = '@GLOBAL' ORDER BY username DESC
>>
>> In 1) and 3) is possible that the query return a table with 2 users
>> ,@GLOBAL, and the user that call spamassassin, what i understand it's
>> that SA only use the preferences for the first user , thats why in 3)
>> the global overrride the user prefs ("ORDER BY username DESC").
>>
>> What i can't understand is whats do the 2) query, and also his
>> description, "Use global and then domain level defaults" .
>> Spamassassin use all the prefs from the @GLOBAL and work with it, and
>> after that use the Domains prefs and wort again, thats what its does?
>> or take the @GLOBAL prefs, and then override the prefs with the
>> Domains prefs.?
>>
>> Please, can anyone explain to me this?
> 
> As Michael said, your query will simply return ALL rows matching your
> control statements. You need to limit the query to return the first
> matching row, by simply adding "LIMIT 1" to the end of the query.
> 
> SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ \
>   OR username = '@GLOBAL' OR username = '@~'||_DOMAIN_ \
>   ORDER BY username ASC LIMIT 1
> 

You DON'T want to do this.

If you did that then you could only have 1 config row.

The given examples work well for most use cases.

Michael


> I guess you could call this method ghetto-XOR :-)
> 
> This would return a row for the matching username, fall back to the
> global user, and last fall back to the default domain settings.
> 
> But why would you want the global user to have precedence over the
> domain specific settings for the requested user? My tip is to have the
> following hierarchical order:
> 
>   1: User specific settings
>   2: Domain specific settings (if user settings are non-existant)
>   3: Global settings (if neither user or domain settings exist)
> 
> 
> Best of luck



Re: Doubt with user_scores_sql_custom_query

2007-01-29 Thread Michael Parker
Jorge Cardona wrote:
> Hi.
> I got a question about this parameter, the spamassassin documentation
> tells this:
> 
> 1) Current default query:
>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
> OR username = '@GLOBAL' ORDER BY username ASC
> 
> 2) Use global and then domain level defaults:
>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
> OR username = '@GLOBAL' OR username = '@~'||_DOMAIN_ ORDER BY username
> ASC
> 
> 3) Maybe global prefs should override user prefs:
>SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_
> OR username = '@GLOBAL' ORDER BY username DESC
> 
> In 1) and 3) is possible that the query return a table with 2 users
> ,@GLOBAL, and the user that call spamassassin, what i understand it's
> that SA only use the preferences for the first user , thats why in 3)
> the global overrride the user prefs ("ORDER BY username DESC").
> 
> What i can't understand is whats do the 2) query, and also his
> description, "Use global and then domain level defaults" .
> Spamassassin use all the prefs from the @GLOBAL and work with it, and
> after that use the Domains prefs and wort again, thats what its does?
> or take the @GLOBAL prefs, and then override the prefs with the
> Domains prefs.?
> 
> Please, can anyone explain to me this?

They override each other, so for instance lets say you do a query and
get back something like this:

@GLOBAL score FOO 50
@-example.com score FOO 75
[EMAIL PROTECTED] score FOO 1

The score for FOO would go from 50 to 75 to finally 1.

Its just like you wrote the following lines in a .cf file:

score FOO 50
score FOO 75
score FOO 1

score FOO 1 would win because it was the last one parsed.  The text
selected by the SQL query is simply feed into the config parser just as
it had been read from a file.

Michael


> 
> Forgive me if the answer it obvious, i really need to be sure with this.
> 
> 



Re: bayes store PgSQL error

2007-01-24 Thread Michael Parker
Tom Allison wrote:
> [1174] dbg: bayes: using username: tallison
> [1174] dbg: bayes: unable to connect to database: missing "=" after
> "bayes:192.168.0.100:5432" in connection info string
> 
> 
> bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
> bayes_sql_dsn  DBI:Pg:bayes:192.168.0.100:5432
> bayes_sql_username tallison
> 
> 
> 
> I get this from spamassassin -D < sample-spam.txt
> 
> It can't be a spamassassin bug, but I'm not sure what the deal is. 
> According to the docs I'm doing this correctly.  BTW, adding '=' isn't
> the solution.
> 

perldoc DBD::Pg and make sure you are using a proper DSN string.

Michael


Re: Problem with mass-check on cygwin

2007-01-18 Thread Michael Parker
Fred T wrote:
> Hello users,
> 
>   I'm getting ready for the mass-check run for rescoring 3.2 and I'm
>   seeing an awful lot of messages like:
> 
> bayes: cannot open bayes databases 
> /cygdrive/E/Temp/spamassassin-trunk/masses/spam
> assassin/bayes_* R/W: lock failed: File exists
> 
> Being on cygwin, what are my options to deal with this problem?
> 
> This message is also cluttering the output from mass-check making it
> difficult to keep an eye on the progress.
> Thank you,
> 

What is your -j value?  If its > 1 then its most likely just lock
contention.  Not sure about cygwin, can you use lock_method flock?  That
might help.  You could also see if you can install and use SDBM for your
bayes DB, that will be a bit quicker and possibly avoid the contention.

If your -j value is > 1 you can try running it at just 1 to see how it
does, but might be too slow depending on how many msgs you are checking.

If your -j value is already 1, then it might be something more serious.

Michael


Re: getting Bayes token data from spamassassin

2007-01-17 Thread Michael Parker
Jonas Eckerman wrote:
> Justin Mason wrote:
>> http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn
> 
> Thanks!
> 
>> by the way, a nice, working plugin that does this would be quite useful
> 
> Since it was so straight-forward I made a small plugin that collects the raw 
> tokens in a SQL table.
> 

Very nice, thats pretty much what I envisioned when I created the plugin
hooks and very similar to my original proof of concept.

If you wanted to reduce the insert/update time you could also do
something like this:
http://jroller.com/page/dschneller?entry=mysql_replication_using_blackhole_engine


Once you have it like you want it, I suggest posting it to the
CustomPlugins wiki page so others can easily find it.

Michael


Re: getting Bayes token data from spamassassin

2007-01-15 Thread Michael Parker
Stuart Robinson wrote:
> Hello, all.
> 
>> On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote:
>>> I've searched around a bit, both on gmane and Google, but I haven't found
>>> much more information regarding your two points. What IS stored in the
>>> token field of the table bayes_token? And how is the SHA1 hash involved?
>> A SHA1 hash is taken of the original token value, and the bottom 40 bits are
>> used as the token from then-on.  There is a plugin call which can be used to
>> store raw token -> hash value data, but otherwise the raw token information 
>> is
>> lost after the message is processed.
> 
> Where could I find more information about the plugin call that allows me
> to do this? 

perldoc Mail::SpamAssassin::Plugin

You should also search the dev list from a couple of years ago at least.
 Lots of discussion about the change and why it was done including, if
memory serves me correctly, a proof of concept plugin to save off the
token values.

> 
>>> Where can I find documentation of this? Any suggestions would be greatly
>>> appreciated.
>> I don't think there's outright documentation about it.  There was a lot of
>> chatter about it on the lists a couple of years ago when the change to
>> using the hash happened.  I recall there being some talk about it recently
>> too, though I can't find it via the archives right now either. :(
> 
> I'll keep looking around. It might be nice to have a configuration option
> that says whether or not to store the raw tokens in the database along
> with their associated hash values.
> 

See the discussion on the dev list.  It was a choice, allowing
configuration caused a serious performance degradation.  The compromise
was the plugin calls, which actually works quite nicely.

Michael


Re: Unknown option: a when restarting spamd

2007-01-05 Thread Michael Parker
Geoff Soper wrote:
> I've recently moved from calling spamassasin to using spamc/spamd. Today
> I had a "/etc/rc.d/init.d/spamd restart" fail with the message "Unknown
> option: a". Googling this led me to the /etc/sysconfig/spamassassin and
> /etc/rc.d/init.d/spamd files, both of which specified the "a" option
> (the former file overriding the latter file).
> 
> What would the "a" option be and why did the two files specify it when
> spamd doesn't accept it?
> 

The -a option was removed AGES ago.  You need to update your start
scripts and read the README/INSTALL docs.  I forget what version, might
be 3.0 or possibly as far back as 2.55.

Michael


Re: SA-Learn Recover to SQL is slow.

2007-01-04 Thread Michael Parker
Big Wave Dave wrote:
> On 1/3/07, Gary V <[EMAIL PROTECTED]> wrote:
>> >It finally finished the restore.
>> >
>> >For the sake of information to help future users
>> >
>> >The "backup" file being used to restore into the new SQL database was
>> >99MB and took 17hrs to import on my AMD 1.2Ghz machine with 1GB of
>> >RAM.
>> >
>> >Dave
>>
>> Could be your database was not expiring. Probably a good idea to do a
>> --force-expire prior to a backup. Just curious, If you run --force-expire
>> now, what does --dump magic look like?
>>
>> Gary V
>>
> 
> Here are the numbers...
> [EMAIL PROTECTED] ~]# sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0253  0  non-token data: nspam
> 0.000  0580  0  non-token data: nham
> 0.000  03637103  0  non-token data: ntokens
> 0.000  0 1167206400  0  non-token data: oldest atime
> 0.000  0 1167890964  0  non-token data: newest atime
> 0.000  0  0  0  non-token data: last journal
> sync atime
> 0.000  0 1167891012  0  non-token data: last expiry atime
> 0.000  0  0  0  non-token data: last expire
> atime delta
> 0.000  0  0  0  non-token data: last expire
> reduction count
> [EMAIL PROTECTED] ~]# sa-learn --force-expire
> [EMAIL PROTECTED] ~]# sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0253  0  non-token data: nspam
> 0.000  0580  0  non-token data: nham
> 0.000  03637103  0  non-token data: ntokens
> 0.000  0 1167206400  0  non-token data: oldest atime
> 0.000  0 1167890964  0  non-token data: newest atime
> 0.000  0  0  0  non-token data: last journal
> sync atime
> 0.000  0 1167891646  0  non-token data: last expiry atime
> 0.000  0  0  0  non-token data: last expire
> atime delta
> 0.000  0  0  0  non-token data: last expire
> reduction count
> [EMAIL PROTECTED] ~]#
> 
> It would appear to me as if it hasn't changed the number of tokens at all.
> 

Run with -D, it will probably tell you there wasn't enough difference to
run the expire.  The SQL import works just like you were learning the
tokens, so the atimes are updated accordingly.  Over time the atime
differences will be enough that you are able to expire.

Someone else mentioned it but I'll followup, probably your auto-expire
has either been broken (do you use MailScanner of Amavis or something
like that?) for some time.  Before you backup, you should run sa-learn
--force-expire to clear things out.  Its obviously too late for that now.

Give it a few days to update the database and you'll be able to start
expiring out data.  It may take a few weeks for you database to get
enough diversity in the atimes to get down to the configured 150k token
level.

Michael


Re: SA-Learn Recover to SQL is slow.

2007-01-03 Thread Michael Parker
Big Wave Dave wrote:
> 
> 
> What am I missing?
> 
> I'd be thankful for any input.

You're not missing anything.  The import takes a long time to run.  Its
doing a lot of updates which are expensive in SQL.  The good news is
that you can pretty much use the system while its doing the import
because everything is atomic.

There might be some tuning you could do on your database side that would
speed things up, but that is a much larger discussion.

Michael


Re: sa + bayes/sqlite _performance_? reasons _not_ to use it?

2006-12-31 Thread Michael Parker
snowcrash+spamassassin wrote:
> i'm interested in using sqlite across my 'entire' mail server env.
> currently, exim+dovecot+spamassassin.
> 
> i know sqlite _can_ be used for bayes db in sa.  lots of info on that.
> 
> any reasons it should NOT be used?
> 
> i'm guessing performance, compared to dbm, might be an issue, but
> other than a comment in sql/README.bayes:
> 
> "NOTE: You may
> find that some implementations do not provide a significant advantage
> over using the default DBM implementation."
> 
> i have not found a performance comparison -- QUESTIONS about it, yes.
> but no ANSWERS (yet).
> 
> any references, info, comments?
> 

There are no published performance numbers for using SQLite because it
is so slow I gave up the tests, deciding it was not even worth the
effort.  When I say slow, I mean 15+ hrs to do what even the basic SQL
storage module on MySQL on MySQL could do in < 5 mins.

This is most likely because a custom storage module for SQLite is
needed, some have pointed this out.

Probably not the answer you wanted, but thats about all there is.

Michael


Re: Spamassassin and Oracle bayesians DB

2006-12-21 Thread Michael Parker
Jose Javier Sianes Ruiz wrote:
> Now I’m studding the possibility to build a very large Bayesian database.
> Due to a huge amount of user I got (over 100,000 and possibly doubled next
> year, with 8MB of Bayesian information each one on theirs Maildirs), I have
> discarded use MySQL or PostgreSQL, my only choice now is Oracle. Is it easy
> to integrate with Spamassassin? How does it works under heavy mail
> concurrency? It seems that row files in bayes_token table will be incredibly
> high (150,000 token entries for each user à 15,000,000,000 rows), any
> suggestion for building tablespaces? All experiences or comments will be
> very appreciated. Thanks for all.
> 

At the risk of garnering the wrath of Michael Scheidell.


Oracle for Bayesian databases is lightly tested.  I've never tested it
myself, I know others who have.  It will probably be best to create a
custom storage module for you circumstances.  With a custom module it
would be possible to split up the database in such a way that makes it
a) easier to manage and/or b) high performance.

I'm very interested in general use for Oracle and if you make any
improvements to the existing storage modules I'd be happy to work with
you to get them folded back into the main distribution.

Michael



Re: Help spamassassin + msql user defined rules

2006-12-13 Thread Michael Parker
Gert Horne wrote:
> Hi,
> 
> I need some help.
> 
> I am trying to configure spamassassin to read my user defined rules.
> 
> I want to be able to block messages based on body and subject rules
> defined in a mysql table
> 
> My debug output state that spamassassin is working fine with mysql
> 
> 

Two things.

1) SQL user_prefs do not work with the spamassassin script, they only
work with spamd.  Its not totally clear what you mean by spamassassin
above, but if its the script then it flat out won't work.

2) You can not put user rules in SQL user_prefs.  If you want to do
something like that you have to change the code.  There is an
unsupported patch somewhere in Bugzilla.

Michael


Re: SA 3.1.7 not picking up SQL-based Bayes

2006-12-03 Thread Michael Parker
C. Bensend wrote:
>> Ahh but you didn't run the command I asked you to run.  You are passing
>> the user: [EMAIL PROTECTED] to SpamAssassin so it will use that as
>> the key for the database, running the command from the command like that
>> way is going to use your unix id as the key.  I'm guessing you changed
>> something in your mail setup to start passing in @domain in addition to
>> the regular unix username.
> 
> Actually, yes, I did, but I don't think it turned out like we
> were expecting (hence I didn't include it, I'm sorry):
> 
> 
> [EMAIL PROTECTED] ~]$ sa-learn -u [EMAIL PROTECTED]   

add the rest of you --dump magic command to that.


> 
> But regardless - won't the user_scores_sql_custom_query I posted
> handle that possibility?  I am _so_ not an SQL guru, but it looks
> correct to me?  I'm never afraid to admit a mistake, so if I'm
> smoking crack here, please step up and say so.  :)
> 

That custom query has nothing to do with bayes or awl sql stuffs.

Michael



> Benny
> 
> 



Re: SA 3.1.7 not picking up SQL-based Bayes

2006-12-03 Thread Michael Parker
C. Bensend wrote:
>> I think its just a slightly confusing message.  If you run:
>> sa-learn -u [EMAIL PROTECTED]
>>
>> Does it show that you have 200 ham and 200 spam in the database?  If so
>> then there is a problem, if not you just need to train it some more.
>>
>> What the WARNING is telling you is that hey this database isn't ready
>> for scoring so I'm not gonna use it.  This is why learning works just
>> fine.  Finish training up the DB and see if it then starts working for
>> you.
>>
>> Michael
>>
>> PS Possibly we should get the warning text changed a bit, feel free to
>> open up a bug so we can track the work, thanks.
> 
> Hi Michael,
> 
> Well, I have the following in the script that runs every now and
> again, to execute sa-learn:
> 
> [EMAIL PROTECTED] ~]$ sa-learn --dump magic | grep "non-token data: nham" |
> awk '{ print $3 }'
> 257526
> [EMAIL PROTECTED] ~]$ sa-learn --dump magic | grep "non-token data: nspam" |
> awk '{ print $3 }'
> 470150
> 
> I'm fairly sure I have enough ham and spam.  :)  Also, I'm watching
> the PostgreSQL logfile when I do that, and it _is_ querying the
> database.
> 

Ahh but you didn't run the command I asked you to run.  You are passing
the user: [EMAIL PROTECTED] to SpamAssassin so it will use that as
the key for the database, running the command from the command like that
way is going to use your unix id as the key.  I'm guessing you changed
something in your mail setup to start passing in @domain in addition to
the regular unix username.

Michael

> Just for argument's sake, I checked for *BAYES* in the spamd logfile,
> and I don't get a single hit.  So, Bayes is definately not working
> for _any_ of the accounts, not just mine.  :(
> 
> Thanks for any insight,
> 
> Benny
> 
> 



Re: SA 3.1.7 not picking up SQL-based Bayes

2006-12-03 Thread Michael Parker
C. Bensend wrote:
> Hey folks,
> 
>I'm finishing up a mailserver upgrade this weekend, and I notice
> that my new SQL-based install isn't picking up on user-based Bayes
> data.  This is on a new, squeaky-clean OpenBSD 4.0-STABLE machine
> running on AMD64, using SpamAssassin 3.1.7 with perl 5.8.8.
> 
> As per spamd -D info:
> 
> 2006-12-03 22:41:53.760956500 [12889] dbg: config: retrieving prefs for
> [EMAIL PROTECTED] from SQL server
> 
> OK, yay, spamd is picking up on the SQL userprefs.
> 
> 2006-12-03 22:41:53.772480500 [12889] dbg: info: user has changed
> 
> Not sure what this means?
> 
> 2006-12-03 22:41:53.774209500 [12889] dbg: bayes: using username:
> [EMAIL PROTECTED]
> 2006-12-03 22:41:53.781308500 [12889] dbg: bayes: database connection
> established
> 2006-12-03 22:41:53.786485500 [12889] dbg: bayes: found bayes db version 3
> 2006-12-03 22:41:53.789654500 [12889] dbg: bayes: unable to initialize
> database for [EMAIL PROTECTED] user, aborting!
> 2006-12-03 22:41:54.117388500 [12889] dbg: bayes: not scoring message,
> returning undef
> 2006-12-03 22:41:54.118260500 [12889] dbg: bayes: opportunistic call
> attempt failed, DB not readable
> 
> Uh.  What does "unable to initialize database" mean?  Spamd has already
> successfully connected to the PostgreSQL database above, right?  So what
> does "initializing database" mean?
> 
> My user_scores_sql_custom_query is as follows, if that makes a
> difference (not sure if that's consulted for Bayes data):
> 
> 
> user_scores_sql_custom_querySELECT preference, value FROM userpref
> WHERE username = _MAILBOX_ OR username = _USERNAME_ OR username =
> '$GLOBAL' ORDER BY user name ASC;
> 
> 
> To add insult to injury, learning spam and ham work just fine.
> It's just the Bayes scoring that seems to have issues.
> 
> So.  I'm at a loss at the moment...  My SA install is doing well,
> but not as well as it should, if it's ignoring Bayes.  What info
> can I pass along to help diagnose this problem?

I think its just a slightly confusing message.  If you run:
sa-learn -u [EMAIL PROTECTED]

Does it show that you have 200 ham and 200 spam in the database?  If so
then there is a problem, if not you just need to train it some more.

What the WARNING is telling you is that hey this database isn't ready
for scoring so I'm not gonna use it.  This is why learning works just
fine.  Finish training up the DB and see if it then starts working for you.

Michael

PS Possibly we should get the warning text changed a bit, feel free to
open up a bug so we can track the work, thanks.

> 
> Thanks much!
> 
> Benny
> 
> 



SQL Performance w/ SpamAssassin

2006-11-28 Thread Michael Parker
Gary V wrote:
> 
> I was curious about a couple settngs that I heard can affect performance
> when using Innodb so I did a few ad hoc tests:
> 
> http://www200.pair.com/mecham/spam/mysqlspeed.txt
> 
> http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/
> 
> http://www.mysql.com/news-and-events/newsletter/2003-11/a000269.html
> 

Thanks Gary,

I've always pointed people elsewhere when it comes to SQL tuning, on the
theory that other places have much better information.

For sure, if you are using SQL in SpamAssassin you're going to want to
be doing some additional tuning on your database server.

Maybe its time we started up a wiki page that collects a few links and
various information about SQL performance tweaks that people are finding
that work.

If you could get that ball rolling I'm sure others would join in and add
to the wiki page with their own data.

Thanks
Michael


Re: Converting bayes DB to MySQL

2006-11-27 Thread Michael Parker
Dan Bongert wrote:
> I'm in the process of converting my Bayes DB setup from in users' home
> directories (since I'm setting up a separate SpamAssassin server, and
> accessing Bayes via NFS is causing insane amounts of I/O).
> 
> After a bunch of fiddling, I have a MySQL server set up properly, tables
> created, and a spamassassin user set up so I can populate the database.
> 
> I have 432 users, with about 1.6 GB of Bayes data to import (from sa-learn
> --backup). I started the import last Friday around 10am, and it's still
> running (Monday at 1pm), on user 379.
> 
> My question is this: is this normal? I don't really have any SQL
> administration experience, so this is all very new to me. For what it's
> worth, I'm using InnoDB instead of MyISAM tables.

Thats probably normal, import takes awhile with SQL since its a lot of
inserts and updates.

Michael


Re: BayesStore/SQL.pm

2006-11-26 Thread Michael Parker
Giampaolo Tomassoni wrote:
> No answer to this?
> 
> Is this the wrong list to ask code details?

I thought I saw an answer to this alreadymaybe I was mistaken.

> 
> Thanks,
> 
> giampaolo
> 
> From: Giampaolo Tomassoni [mailto:[EMAIL PROTECTED]
>> What is $self->_userid in seen_put() and the like?

_userid is a private variable, generally anything that starts with an
underscore is a private variable.

>>
>> The uid of the process running SpamAssassing (i.e.: amavis) or 
>> the message destinating user?
>>
>> If the first, how can I get the message destinating user from 
>> subclasses of BayesStore/SQL.pm? I mean, in many SQL.pm functions 
>> it seems to me that the context about the message under process 
>> is not available. I would need to get the destinating mailbox 
>> (thereby the destinating user). Is there any way to obtain this?
>>

The internal userid value for BayesStore/SQL implementations is either
a) the username that spamassassin is currently running as or b) the
value of the bayes_sql_override_username if set.

The username value is set either at SpamAssassin object creation time or
if you are running spamd/spamc whatever is passed via -u in spamc.

Michael

>> Thanks,
>>
>> ---
>> Giampaolo Tomassoni - IT Consultant
>> Piazza VIII Aprile 1948, 4
>> I-53044 Chiusi (SI) - Italy
>> Ph: +39-0578-21100
>>
>> MAI inviare una e-mail a:
>> NEVER send an e-mail to:
>>  [EMAIL PROTECTED]
>>
> 



Re: How to use --allow-tell?

2006-11-26 Thread Michael Parker
Todd A. Jacobs wrote:
> I was perusing the man pages for spamd in spamassassin 3.1.7, and came
> across something that seems to imply that I can use spamc to tell spamd
> to update a sitewide bayesian database:
> 
> -l, --allow-tell
>   Allow learning and forgetting (to a local Bayes database),
>   reporting and revoking (to a remote database) by spamd. The
>   client issues a TELL command to tell what type of message is
>   being processed and whether local (learn/forget) or remote
>   (report/revoke) databases should be updated.
> 
> However, I can't find any explanation of how to actually *do* this. What
> am I missing here?
> 

Indeed, --allow-tell turns on the TELL command for the spamd protocol.
You can find more about the protocol here:

http://svn.apache.org/repos/asf/spamassassin/trunk/spamd/PROTOCOL

You only need to worry about the specifics of the protocol if you aren't
going to using spamc, since spamc has the commands built in.

>From the spamc man page:

-L learn type
Send message to spamd for learning.  The "learn type" can be either
spam, ham or
forget.  The exitcode for spamc will be set to 5 if the message was
learned, or 6
if it was already learned.

Note that the "spamd" must run with the "--allow-tell" option for
this to work.


And:

-C report type
Report or revoke a message to one of the configured collaborative
filtering
databases.  The "report type" can be either report or revoke.

Note that the "spamd" must run with the "--allow-tell" option for
this to work.


And example might be:

spamc -u  -L spam < spammsg.txt

There are also extensions available for Thunderbird and Outlook that do
this for you:

http://sourceforge.net/projects/soc2006spamd/

Michael


Re: subscribing to the users list documentation

2006-10-10 Thread Michael Parker
Email Lists wrote:
> 
> Personally, I would make it stand out in a different yet better way... it
> isn't like I didn't look for it for 15 minutes and I quit being "stupid"
> years ago...
> 
> Or so I thought  ;-)
> 

Its a WIKI!!!  Make it better!!

Michael


ApacheCon 2006 and SpamAssassin

2006-09-14 Thread Michael Parker
Howdy,

This year ApacheCon will be held in Austin, Texas.

http://www.us.apachecon.com/index.html


In addition to all of the other Apache Software Foundation related talks
there will be at least two SpamAssassin talks:

High Performance Apache SpamAssassin
Extending Apache SpamAssassin Using Plugins

You can read more about them here:
http://www.us.apachecon.com/html/sessions.html

I've also scheduled a BOF for Wednesday night.

Multiple SpamAssassin developers will be on hand to answer questions and
what not.

Downtown Austin is a fantastic place for a conference, there is tons to
do, most of it within walking distance of the conference hotel.

Michael


Re: Bayes conversion from DB to SQL question

2006-09-13 Thread Michael Parker
Tim Rosmus wrote:
> I've been running multiple in/out servers using Bayes and the local
> Bayes DB storage on the local machine[s].  Now I am moving Bayes
> to a site wide SQL setup.   My question is on the sa-learn backup/
> restore from DB to SQL...
> 
> Should I backup/restore all local machine Bayes DB's to the central
> SQL server, or should I only pick one machine that seems to have
> the most actives Bayes DB, and just move that?
> 

Pick the best one and use that.

Michael


Re: Allowing IMAP/POP to Send Email & United Nations etc....

2006-08-03 Thread Michael Parker
Nigel Frankcom wrote:
> I'll put on my flameproof underwear for this
> 
> There's been a huge amount of crossfire on these/this subject, but I
> don't see how it has anything to do with SA; or am I missing the
> point?
> 
> Different protocols, yet another level of policing, but nothing about
> the fact that SA does a damned fine job of stopping what exists now,
> not what may or may not happen (n) years in the future.
> 
> Just my 2 pence worth
> 
> Nigel
> 

google "marc perkel"

My $.02

Michael


Re: sa-learn slow with Bayes and PostgreSQL

2006-07-11 Thread Michael Parker
Randall Perry wrote:
> I recently updated to the latest SA and at the same time converted bayes
> from file db to PostgreSQL.
> 
> I notice that using sa-learn with SQL now is very slow compared to file db.
> Is this normal, and is accessing the db while scanning mail any slower with
> SQL?
> 
> 

Yes.  Check out the benchmarks here:

http://wiki.apache.org/spamassassin/BayesBenchmarkResults

Michael


Re: SA::DBI plugin and SA 3.1.3

2006-06-23 Thread Michael Parker
Michael Monnerie wrote:
> On Samstag, 24. Juni 2006 02:30 Michael Parker wrote:
>> Add --debug dbiplugin to your starup command line.
> 
> Sorry I checked that already, but forgot to post it:
> # spamd -D dbiplugin -q -c -l -r /var/run/spamd.pid --min-children=2 
> --max-children=15 --min-spare=2
> [27733] dbg: dbiplugin: Creating uncached database handle to 
> 'dbname=zmi_sa_bayes;host=localhost_zmi_sa_bayes_bayes_AutoCommit=0_PrintError=0_Username=zmi_sa_bayes'
> [27733] info: spamd: server started on port 783/tcp (running version 3.1.3)
> [27733] info: spamd: server pid: 27733
> [27746] dbg: dbiplugin: In spamd_child_init
> [27733] info: spamd: server successfully spawned child process, pid 27746
> [27747] dbg: dbiplugin: In spamd_child_init
> [27733] info: spamd: server successfully spawned child process, pid 27747
> [27733] info: prefork: child states: II
> 
> # spamassassin --debug dbiplugin --lint 2>&1
> [27929] dbg: dbiplugin: Creating uncached database handle to 
> 'dbname=zmi_sa_bayes;host=localhost_zmi_sa_bayes_bayes_AutoCommit=0_PrintError=0_Username=zmi_sa_bayes'
> 
> Looks good?
> 
> mfg zmi

The first one in spamd will not be cached (really anything that isn't in
a child), and will never be cached in spamassassin.  So you'd have to
look at the subsequent accesses in spamd to know for sure.

Michael


Re: SA::DBI plugin and SA 3.1.3

2006-06-23 Thread Michael Parker
Michael Monnerie wrote:
> On Dienstag, 20. Juni 2006 18:09 Michael Parker wrote:
>> You're pointed at the wrong DBI.pm.  I updated the wiki to make it
>> more obvious.
> 
> It's running now, but I can't see caching to happen. Below some
> log lines. Any ideas?

Add --debug dbiplugin to your starup command line.

Michael


Re: SQL Bayes with Postgres in SUSE9.3

2006-06-22 Thread Michael Parker
Michael Monnerie wrote:
> 
> I would say the docs are not correct, at least to one who is not 
> specialist in configuring DBI. I found the info on the DBI man page, 
> but still the docs here are wrong.

You are not reading completely, especially the part that says:

"For an example of connection to PostgreSQL, see the main README file."

Which provides the exact information you are looking for.

But like I said, its open source, if anyone feels that the documentation
could use a little more information, then they are free to provide patches.

Michael


  1   2   3   4   >