Re: bayes autolearn off but journal updated

2009-01-22 Thread Justin Mason
On Thu, Jan 22, 2009 at 02:48, Matt Kettler mkettler...@verizon.net wrote:
 Matus UHLAR - fantomas wrote:

 On 20.01.09 19:45, Matt Kettler wrote:

 Yes, more specifically, it's mostly going to be updating the atime, or
 time of last access, records for tokens. This time is used by the expiry
 process to drop the least recently used tokens.


 What does SA do, if it can't r/w open bayes database? Will it skip BAYES
 checks or just tie it r/o ?

 (I notice ocasional missing BAYES in X-Spam headers)

 Well, first let's be clear.. it's R/W opening the journal, not the
 database itself.

 The main _toks and _seen files are only locked R/W if there's one of the
 following going on:
 learning without bayes_learn_to_journal set
 a journal sync
 token expiry is running

 As for write locks to the journal, if for some reason there's a
 conflict, the update is just dropped with a warning. This isn't
 incredibly likely unless your bayes is really busy, as journal updates
 are pretty short in nature.

on POSIX filesystems, this should be virtually impossible, since the
file is opened for append with atomic writes.

--j.

 If you look at /lib/Mail/SpamAssassin/BayesStore/DBM.pm and find sub
 cleanup in it.

 Snippets of that code:

  my $path = $self-_get_journal_filename();
  ...

  if (!open (OUT, .$path)) {
warn bayes: cannot write to $path, bayes db update ignored: $!\n;
umask $umask; # reset umask
return;
   }






Re: bayes autolearn off but journal updated

2009-01-22 Thread Paweł Sasin
  Yes, more specifically, it's mostly going to be updating the
  atime, or time of last access, records for tokens. This time is
  used by the expiry process to drop the least recently used tokens.
 
 
  What does SA do, if it can't r/w open bayes database? Will it skip
  BAYES checks or just tie it r/o ?
 
  (I notice ocasional missing BAYES in X-Spam headers)
 
  Well, first let's be clear.. it's R/W opening the journal, not the
  database itself.
 
  The main _toks and _seen files are only locked R/W if there's one
  of the following going on:
  learning without bayes_learn_to_journal set
  a journal sync
  token expiry is running
 
  As for write locks to the journal, if for some reason there's a
  conflict, the update is just dropped with a warning. This isn't
  incredibly likely unless your bayes is really busy, as journal
  updates are pretty short in nature.
 
 on POSIX filesystems, this should be virtually impossible, since the
 file is opened for append with atomic writes.

It is quite common on Solaris with 40+ working spamds and really high
traffic volume. Some time ago we had such situation. The server had 50%
idle while the spamds were striving to lock the journal (auto_learn and
auto_expire disabled) rather than going on to handle a next message. Ie
the machine was 50% idle but was unable to handle more messages and the
bottleneck was in journal updates.

-- 
Paweł Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: bayes autolearn off but journal updated

2009-01-22 Thread Matus UHLAR - fantomas
  On 20.01.09 19:45, Matt Kettler wrote:
  Yes, more specifically, it's mostly going to be updating the atime, or
  time of last access, records for tokens. This time is used by the expiry
  process to drop the least recently used tokens.

  Matus UHLAR - fantomas wrote:
  What does SA do, if it can't r/w open bayes database? Will it skip BAYES
  checks or just tie it r/o ?
 
  (I notice ocasional missing BAYES in X-Spam headers)

 On Thu, Jan 22, 2009 at 02:48, Matt Kettler mkettler...@verizon.net wrote:
  Well, first let's be clear.. it's R/W opening the journal, not the
  database itself.

well, sorry, OK.

  As for write locks to the journal, if for some reason there's a
  conflict, the update is just dropped with a warning. This isn't
  incredibly likely unless your bayes is really busy, as journal updates
  are pretty short in nature.

Yes, this is what I wanted to know...

On 22.01.09 09:47, Justin Mason wrote:
 on POSIX filesystems, this should be virtually impossible, since the
 file is opened for append with atomic writes.

we have mailboxes on NFS, accessed from more machined, i guess that may be
the reason.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I feel like I'm diagonally parked in a parallel universe. 


Re: bayes autolearn off but journal updated

2009-01-22 Thread Justin Mason
On Thu, Jan 22, 2009 at 10:05, Paweł Sasin hanni...@wp-sa.pl wrote:
  Yes, more specifically, it's mostly going to be updating the
  atime, or time of last access, records for tokens. This time is
  used by the expiry process to drop the least recently used tokens.
 
 
  What does SA do, if it can't r/w open bayes database? Will it skip
  BAYES checks or just tie it r/o ?
 
  (I notice ocasional missing BAYES in X-Spam headers)
 
  Well, first let's be clear.. it's R/W opening the journal, not the
  database itself.
 
  The main _toks and _seen files are only locked R/W if there's one
  of the following going on:
  learning without bayes_learn_to_journal set
  a journal sync
  token expiry is running
 
  As for write locks to the journal, if for some reason there's a
  conflict, the update is just dropped with a warning. This isn't
  incredibly likely unless your bayes is really busy, as journal
  updates are pretty short in nature.

 on POSIX filesystems, this should be virtually impossible, since the
 file is opened for append with atomic writes.

 It is quite common on Solaris with 40+ working spamds and really high
 traffic volume. Some time ago we had such situation. The server had 50%
 idle while the spamds were striving to lock the journal (auto_learn and
 auto_expire disabled) rather than going on to handle a next message. Ie
 the machine was 50% idle but was unable to handle more messages and the
 bottleneck was in journal updates.

You definitely mean the journal, right?  not the bayes dbs?
interesting to hear this, I haven't encountered it before...

--j.


Re: bayes autolearn off but journal updated

2009-01-21 Thread Matus UHLAR - fantomas
  On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:

  Why does it update the journal? Why does it try to open journal in R/W 
  mode?

 Theo Van Dinter wrote:
  $ man sa-learn

Oh, sorry for missing that in docs :(

  In other words, the journal isn't just for learning.

On 20.01.09 19:45, Matt Kettler wrote:
 Yes, more specifically, it's mostly going to be updating the atime, or
 time of last access, records for tokens. This time is used by the expiry
 process to drop the least recently used tokens.

What does SA do, if it can't r/w open bayes database? Will it skip BAYES
checks or just tie it r/o ?

(I notice ocasional missing BAYES in X-Spam headers)
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.


Re: bayes autolearn off but journal updated

2009-01-21 Thread Matt Kettler
Matus UHLAR - fantomas wrote:

 On 20.01.09 19:45, Matt Kettler wrote:
   
 Yes, more specifically, it's mostly going to be updating the atime, or
 time of last access, records for tokens. This time is used by the expiry
 process to drop the least recently used tokens.
 

 What does SA do, if it can't r/w open bayes database? Will it skip BAYES
 checks or just tie it r/o ?

 (I notice ocasional missing BAYES in X-Spam headers)
   
Well, first let's be clear.. it's R/W opening the journal, not the
database itself.

The main _toks and _seen files are only locked R/W if there's one of the
following going on:
 learning without bayes_learn_to_journal set
 a journal sync
 token expiry is running

As for write locks to the journal, if for some reason there's a
conflict, the update is just dropped with a warning. This isn't
incredibly likely unless your bayes is really busy, as journal updates
are pretty short in nature.

If you look at /lib/Mail/SpamAssassin/BayesStore/DBM.pm and find sub
cleanup in it.

Snippets of that code:

  my $path = $self-_get_journal_filename();
  ...

  if (!open (OUT, .$path)) {
warn bayes: cannot write to $path, bayes db update ignored: $!\n;
umask $umask; # reset umask
return;
   }





bayes autolearn off but journal updated

2009-01-20 Thread Matus UHLAR - fantomas
Hello,

on my systems I turned bayes filter off by default:

cd /etc/mail/spamassassin/
grep bayes *

local.cf:use_bayes 0
local.cf:bayes_auto_learn 0
local.cf:bayes_auto_expire 0
local.cf:bayes_learn_to_journal 1

...I keep the journal default so any user who turns on bayes, would use
journalling even for manual learning.

One of users has BAYES turned on, without changing value of auto_learn or
anything:

# bayes databazu plnit budeme...
use_bayes 1
bayes_auto_learn 0
bayes_auto_expire 0

However, this users' bayes_journal keeps being changed, even without manual
intervention. I also get ocasionally the error in logs:

Jan 20 16:33:22 t02 spamd[5073]: bayes: cannot open bayes databases 
/.../.spamassassin/bayes_* R/W: lock failed: File exists

Why does it update the journal? Why does it try to open journal in R/W mode?

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Where do you want to go to die? [Microsoft]


Re: bayes autolearn off but journal updated

2009-01-20 Thread Theo Van Dinter
On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
 Why does it update the journal? Why does it try to open journal in R/W mode?

$ man sa-learn
[...]
   bayes_journal
   While SpamAssassin is scanning mails, it needs to track which tokens 
it uses in its cal-
   culations.  To avoid the contention of having each SpamAssassin 
process attempting to
   gain write access to the Bayes DB, the token timestamps are written 
to a ’journal’ file
   which will later (either automatically or via sa-learn --sync) be 
used to synchronize
   the Bayes DB.

In other words, the journal isn't just for learning.

-- 
Randomly Selected Tagline:
Cats are smarter than dogs.  You can't make eight cats pull a sled through
 the snow.


pgpHkdGFBX2Ib.pgp
Description: PGP signature


Re: bayes autolearn off but journal updated

2009-01-20 Thread Matt Kettler
Theo Van Dinter wrote:
 On Tue, Jan 20, 2009 at 04:49:12PM +0100, Matus UHLAR - fantomas wrote:
   
 Why does it update the journal? Why does it try to open journal in R/W mode?
 

 $ man sa-learn
 [...]
bayes_journal
While SpamAssassin is scanning mails, it needs to track which 
 tokens it uses in its cal-
culations.  To avoid the contention of having each SpamAssassin 
 process attempting to
gain write access to the Bayes DB, the token timestamps are 
 written to a ’journal’ file
which will later (either automatically or via sa-learn --sync) 
 be used to synchronize
the Bayes DB.

 In other words, the journal isn't just for learning.

   
Yes, more specifically, it's mostly going to be updating the atime, or
time of last access, records for tokens. This time is used by the expiry
process to drop the least recently used tokens.



Bayes and last journal sync atime

2009-01-06 Thread Kai Schaetzl
I find that last journal sync atime is 0 on my Bayes setups that use 
MySQL. So, can I assume that there is no journal (well, there's no table 
and file for it, anyway) and stuff is added directly to the database? 
(which makes sense).
However, looking at my setups that still use dbm files I find that the 
last journal sync atime is completely wrong on them. e.g. if I do a
sa-learn --sync the last journal sync atime doesn't change and it's 
months old.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-17 Thread Samy Ascha, Xel Media B.V.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hey Jake,

Thx for your reply. I got this same tip off-list (from Jonas  
Eckerman). I liked
the idea and I have already done some successful testing of  
centralized bayes-data

storage in a MySQL database.

We are using an SQL back-end for storing 'all things e-mail' anywayz,  
so this

was easily fitted in.

I will be roling stuff out as soon as it is ready for production.

Alse, the READMEs in the distribution were very useful for setting  
this up. I

did not need any other resources and there were zero issues.

Thx to Jonas, Jake and the list for helping out, gj ;)

Regards,
Samy

I'm keeping these full messages in here, as they may present a (kinda)  
full problem

and solution for others having similar issues.


On Nov 11, 2008, at 11:51 PM, Jake Maul wrote:

On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] 
 wrote:
I have recently setup a mailbox and a sa-learn script to start  
teaching

SpamAssassin. This was all no problem, but:

We have an MX group of usually about 3 MTAs, which all run their  
own content
filter (amavis) and thus use their own SpamAssassin's database.  
When we are
gonna start teaching SpamAssassin with sa-learn, I need to somehow  
sync the

results in the journal to all these hosts.

I've checked out the --no-sync and --sync options and I think these  
options

will give me exactly the tools I need for this job.

I need to know the location of the journal though and I need to  
know if
there are any pitfalls when syncing a SpamAssassin with a journal  
from

another one on another server.

Has anyone got experience with syncing sa-learn between multiple  
MTAs? How
did you solve this? Can SA sync with a journal in an arbitrary  
location, or

does it look for it in one preconfigged place?

I hope u have some interresting thought about this issue.


Ultimately, you're not syncing 'sa-learn', you're syncing the bayes'
DB that sa-learn (and spamd) records to. There's a few ways to go
about sharing the bayesian database. Probably the best bet would be to
store the bayes DB in MySQL, and point SA on all 3 servers to it-
ideally with the database on a 4th server (hey, you can put the AWL
info into MySQL as well... may as well hit that up at the same time).

You could probably go the --sync and --no-sync route if you fiddled
with it enough (never tried it), but honestly a single MySQL DB for
bayes would probably be a lot simpler if you have any experience at
all with MySQL. It's been good for performance for us even when used
on a single server, and it's pretty bulletproof for us- been in use
for years. The only tip you really need here is to run OPTIMIZE TABLE
every now and then.

An alternative hacky solution: turn off autolearn on 2 of the 3, and
do sa-learns and autolearning on the 3rd. Then nightly rsync all the
bayes DB files over to the other 2 servers and restart spamd. Not
pretty, but it should work.

Jake


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkkhQpcACgkQKIdvzp2UK/Fj+gCeIdwltuT96Zv3vYDplXR0Dh+7
9ykAoIlkJkEF1AZqH6ABbcWGFVXemBhA
=gbAW
-END PGP SIGNATURE-


Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-11 Thread Jake Maul
On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. [EMAIL PROTECTED] 
wrote:
 I have recently setup a mailbox and a sa-learn script to start teaching
 SpamAssassin. This was all no problem, but:

 We have an MX group of usually about 3 MTAs, which all run their own content
 filter (amavis) and thus use their own SpamAssassin's database. When we are
 gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the
 results in the journal to all these hosts.

 I've checked out the --no-sync and --sync options and I think these options
 will give me exactly the tools I need for this job.

 I need to know the location of the journal though and I need to know if
 there are any pitfalls when syncing a SpamAssassin with a journal from
 another one on another server.

 Has anyone got experience with syncing sa-learn between multiple MTAs? How
 did you solve this? Can SA sync with a journal in an arbitrary location, or
 does it look for it in one preconfigged place?

 I hope u have some interresting thought about this issue.

Ultimately, you're not syncing 'sa-learn', you're syncing the bayes'
DB that sa-learn (and spamd) records to. There's a few ways to go
about sharing the bayesian database. Probably the best bet would be to
store the bayes DB in MySQL, and point SA on all 3 servers to it-
ideally with the database on a 4th server (hey, you can put the AWL
info into MySQL as well... may as well hit that up at the same time).

You could probably go the --sync and --no-sync route if you fiddled
with it enough (never tried it), but honestly a single MySQL DB for
bayes would probably be a lot simpler if you have any experience at
all with MySQL. It's been good for performance for us even when used
on a single server, and it's pretty bulletproof for us- been in use
for years. The only tip you really need here is to run OPTIMIZE TABLE
every now and then.

An alternative hacky solution: turn off autolearn on 2 of the 3, and
do sa-learns and autolearning on the 3rd. Then nightly rsync all the
bayes DB files over to the other 2 servers and restart spamd. Not
pretty, but it should work.

Jake


sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-07 Thread Samy Ascha, Xel Media B.V.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear members,

I have recently setup a mailbox and a sa-learn script to start  
teaching SpamAssassin. This was all no problem, but:


We have an MX group of usually about 3 MTAs, which all run their own  
content filter (amavis) and thus use their own SpamAssassin's  
database. When we are gonna start teaching SpamAssassin with sa-learn,  
I need to somehow sync the results in the journal to all these hosts.


I've checked out the --no-sync and --sync options and I think these  
options will give me exactly the tools I need for this job.


I need to know the location of the journal though and I need to know  
if there are any pitfalls when syncing a SpamAssassin with a journal  
from another one on another server.


Has anyone got experience with syncing sa-learn between multiple MTAs?  
How did you solve this? Can SA sync with a journal in an arbitrary  
location, or does it look for it in one preconfigged place?


I hope u have some interresting thought about this issue.

Thx much and regards,
Samy Ascha

Xel Media Internet Services

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkkUKlQACgkQKIdvzp2UK/HoLgCgoLnB4PeP5Vg159g+f5YfSnCo
LacAn22WXVRd8y/SSqPMKeNGi9qwEjaS
=3sbv
-END PGP SIGNATURE-


Re: sa-learn journal location for teaching spamassassin on multiple hosts

2008-11-07 Thread Matus UHLAR - fantomas
On 07.11.08 12:45, Samy Ascha, Xel Media B.V. wrote:
 I have recently setup a mailbox and a sa-learn script to start  
 teaching SpamAssassin. This was all no problem, but:
 
 We have an MX group of usually about 3 MTAs, which all run their own  
 content filter (amavis) and thus use their own SpamAssassin's  
 database. When we are gonna start teaching SpamAssassin with sa-learn,  
 I need to somehow sync the results in the journal to all these hosts.

We have group of four MTA servers. However they don't run SA on MTA level 
(yet). We have users' mailboxes on shared storage cluster, so their bayes DB
is on shared space.

I'd solve your case by configuring MTA's w/o BAYES, or maybe by using users'
configs, if possible - if the mail is sent to one user, should not be a
problem. For mail sent to more users, somehow generic configuration and
filtering will be used, so users may be willing to have the mail rechecked
for spamminess.

 Has anyone got experience with syncing sa-learn between multiple MTAs?  
 How did you solve this? Can SA sync with a journal in an arbitrary  
 location, or does it look for it in one preconfigged place?

I am not sure if it's safe to use journal or bayes DB nfs-mounted...

-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.


Understanding Bayes journal sync

2008-08-17 Thread Kai Schaetzl
I have started to use a different method to call SA on some of my machines 
than I used in the past because the web interface (ISPConfig) I chose 
integrates with SA and clamav (via clamassassin). This is now classic SA 
calling via procmail. The other methods I used before and still use on 
other machines are MailScanner and a special spamc-like milter. There I 
have never seen this problem.
So, what happens is that users get completely blank mail after the first 
one or two weeks of use. When I ran sa -D it became apparent that it's 
trying to sync the Bayes journal and couldn't acquire a lock because there 
already were two lockfiles: bayes_journal.lock and 
bayes_journal.FQDN.lock or some such. The journal had grown to about 55 
MB. That somehow led to a timeout and the empty mail. Once I removed the 
lock files and ran a --sync it took only a few seconds to finish the sync.
I would like to know how this locking problem can happen as it could 
frequently spoil the user experience. I assume it could happen (similar 
to bayes expiry) when it's time to sync and the sa run or the sync itself 
times out and is killed by the procmail process, leaving behind the 
lockfile, or so? Other possible causes?
What's the best method to avoid it? There's no setting like 
bayes_auto_expire for the journal sync. Should I set the 
bayes_journal_max_size to 100 MB or so and then run a nightly sync?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: Understanding Bayes journal sync

2008-08-17 Thread Kai Schaetzl
Kai Schaetzl wrote on Sun, 17 Aug 2008 14:09:17 +0200:

 Should I set the 
 bayes_journal_max_size to 100 MB or so and then run a nightly sync?

I reread the conf page. Of course, 0 would be the correct setting to stop 
the fly-by syncing.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Usage of journal in Bayesian Filtering.

2007-08-30 Thread Srilatha

Hi,

I am trying understand the usage of journal in Bayesian Filtering.

If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens 
in the journal.


When bayesian filter is activated, while scanning a message
SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel'

While scanning a message, tokens found in bayes_tokens database are 
written to bayes_journel with modified timestamp



Is my understanding correct ?
Please correct me if my understanding is wrong

regards,
Srilatha





This email message (including any attachments) is for the sole use of the intended recipient(s) 
and may contain confidential, proprietary and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended recipient, 
please immediately notify the sender by reply email and destroy all copies of the original message. 
Thank you.


Intoto Inc. 



Re: My bayes journal just keeps growing

2006-12-14 Thread Theo Van Dinter
On Thu, Dec 14, 2006 at 12:48:34PM +0530, Ramprasad wrote:
 The problem is my bayes_journal file grows immensely ( around 500Mb a
 day ) but the bayes_toks files hardly gets touched

It sounds like syncing is not working for you.

 When I do a bayes-expiry the process seems to hang (after even 3-4
 hours ) and I simply resort to deleting the journal file. Because I cant

Why do you delete the journal, which has nothing to do with expiry?  Have you
run in debug mode to see what is going on?

-- 
Randomly Selected Tagline:
You tell 'em Goldfish, You've been around the globe.


pgponvjmQucWL.pgp
Description: PGP signature


My bayes journal just keeps growing

2006-12-13 Thread Ramprasad
I run SA 3.1.5 with MailScanner

I have in my cf file
bayes_learn_to_journal  1
use_bayes 1
bayes_path /var/spool/MailScanner/spamassassin/bayes
bayes_file_mode 0666
bayes_auto_expire 0

The problem is my bayes_journal file grows immensely ( around 500Mb a
day ) but the bayes_toks files hardly gets touched

When I do a bayes-expiry the process seems to hang (after even 3-4
hours ) and I simply resort to deleting the journal file. Because I cant
keep waiting for expiry to get complete. (We get a HUGE traffic of
around 7 Million mails a day on 14 loadbalanced servers )

I am looking at MySQL based bayes, but that will take time to get
implemented
What is the best way of setting up bayes for high traffic servers

Thanks
Ram








Bayes journal problem

2006-01-25 Thread Enrico Morelli
Dear all,

I'm using spamassassin 3.1.0 without problems.
Starting from today I see the following messages in the log files of my
mail server:
Jan 24 16:35:13 alpha spamd[8295]: partial write to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 118632),
recovering. Jan 24 16:35:14 alpha spamd[8293]: partial write to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 111984),
recovering. Jan 24 16:35:14 alpha spamd[8294]: partial write to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 117408),
recovering. Jan 24 16:35:14 alpha spamd[8294]: cannot write to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal, aborting! Jan 24
16:35:14 alpha spamd[8294]: Exiting subroutine via last
at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
line 1073. Jan 24 16:35:14 alpha last message repeated 2 times Jan 24
16:35:14 alpha spamd[8294]: Exiting eval via last
at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
line 1073. Jan 24 16:35:14 alpha spamd[8295]: partial write to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 118632),
recovering. 

What's happen?


-- 
---
   (o_
(o_//\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+--+
| ENRICO MORELLI |  email: [EMAIL PROTECTED]   |
| * *   *   *|  phone: +39 055 4574269 |
|  University of Florence|  fax  : +39 055 4574253 |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY|
+--+



Re: Bayes journal problem

2006-01-25 Thread Loren Wilton
Perhaps you have per-user Bayes and the user has gone over-quota on disk
space?

Loren



Re: Bayes journal problem

2006-01-25 Thread Enrico Morelli
On Wed, 25 Jan 2006 03:14:30 -0800
Loren Wilton [EMAIL PROTECTED] wrote:

 Perhaps you have per-user Bayes and the user has gone over-quota on
 disk space?
 
 Loren
 

I checked and yes I have per-user Bayes and some users was out of quota.
This is the problem?

Thanks a lot.

-- 
---
   (o_
(o_//\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+--+
| ENRICO MORELLI |  email: [EMAIL PROTECTED]   |
| * *   *   *|  phone: +39 055 4574269 |
|  University of Florence|  fax  : +39 055 4574253 |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY|
+--+


Re: Bayes journal problem

2006-01-25 Thread Enrico Morelli
On Wed, 25 Jan 2006 12:31:02 +0100
Enrico Morelli [EMAIL PROTECTED] wrote:

 On Wed, 25 Jan 2006 03:14:30 -0800
 Loren Wilton [EMAIL PROTECTED] wrote:
 
  Perhaps you have per-user Bayes and the user has gone over-quota on
  disk space?
  
  Loren
  
 
 I checked and yes I have per-user Bayes and some users was out of
 quota. This is the problem?
 
 Thanks a lot.
 

I add some disk quota to the users that was out of quota, but the
problem seems unresolved.

Jan 25 12:31:13 alpha spamd[2021]: bayes: write failed to Bayes
journal /etc/mail/spamassassin/BAYES/bayes_journal (0 of 263856)! Jan
25 12:31:13 alpha spamd[2021]: Exiting subroutine via last
at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
line 1126. Jan 25 12:31:13 alpha last message repeated 2 times Jan 25
12:31:13 alpha spamd[2021]: Exiting eval via last
at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
line 1126. 


-- 
---
   (o_
(o_//\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+--+
| ENRICO MORELLI |  email: [EMAIL PROTECTED]   |
| * *   *   *|  phone: +39 055 4574269 |
|  University of Florence|  fax  : +39 055 4574253 |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY|
+--+


Re: Bayes journal problem

2006-01-25 Thread Loren Wilton
 I add some disk quota to the users that was out of quota, but the
 problem seems unresolved.

 Jan 25 12:31:13 alpha spamd[2021]: bayes: write failed to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal (0 of 263856)! Jan
 25 12:31:13 alpha spamd[2021]: Exiting subroutine via last

This still looks like a quota or permissions problem, or maybe a missing
home directory for some user.  I think this is saying that Bayes could not
write to a file in etc/mail/spamassassin/BAYES.  I'm guessing this is a
common file rather than a per-user file.  So perhaps the user does not have
permission to write to this directory?

Loren



Re: Bayes journal problem

2006-01-25 Thread Theo Van Dinter
On Wed, Jan 25, 2006 at 04:04:28AM -0800, Loren Wilton wrote:
  Jan 25 12:31:13 alpha spamd[2021]: bayes: write failed to Bayes
  journal /etc/mail/spamassassin/BAYES/bayes_journal (0 of 263856)! Jan
  25 12:31:13 alpha spamd[2021]: Exiting subroutine via last
 This still looks like a quota or permissions problem, or maybe a missing
 home directory for some user.

FWIW, the error occurs when the journal has been opened for writing (so in
theory the permissions and such should be ok), but any attempt to actually put
data into the journal fails, specifically that syswrite() (therefore the
system's write() function) returns an error.

I've attached a patch which you could use against M::SA::BayesStore::DBM
which makes that error message include the system's error string which
should hopefully be useful.

-- 
Randomly Generated Tagline:
A nod's as good as a wink to a blind bat!
Index: lib/Mail/SpamAssassin/BayesStore/DBM.pm
===
--- lib/Mail/SpamAssassin/BayesStore/DBM.pm (revision 366688)
+++ lib/Mail/SpamAssassin/BayesStore/DBM.pm (working copy)
@@ -1122,8 +1122,12 @@
 
 # argh, write failure, give up
 if (!defined $len || $len  0) {
-  $len = 0 unless (defined $len);
-  warn bayes: write failed to Bayes journal $path ($len of $nbytes)!\n;
+  my $err = '';
+  if (!defined $len) {
+   $len = 0;
+   $err =   ($!);
+  }
+  warn bayes: write failed to Bayes journal $path ($len of 
$nbytes)!$err\n;
   last;
 }
 


pgpFzhC4hzYAx.pgp
Description: PGP signature


Re: Bayes journal problem

2006-01-25 Thread Enrico Morelli
On Wed, 25 Jan 2006 04:04:28 -0800
Loren Wilton [EMAIL PROTECTED] wrote:

  I add some disk quota to the users that was out of quota, but the
  problem seems unresolved.
 
  Jan 25 12:31:13 alpha spamd[2021]: bayes: write failed to Bayes
  journal /etc/mail/spamassassin/BAYES/bayes_journal (0 of 263856)!
  Jan 25 12:31:13 alpha spamd[2021]: Exiting subroutine via last
 
 This still looks like a quota or permissions problem, or maybe a
 missing home directory for some user.  I think this is saying that
 Bayes could not write to a file in etc/mail/spamassassin/BAYES.  I'm
 guessing this is a common file rather than a per-user file.  So
 perhaps the user does not have permission to write to this directory?
 
 Loren
 

The BAYES is a directory containing the DBs used by spamassassin and
the files are for general purpose, not per-user files.

# ls -la /etc/mail/spamassassin
drwx--  2 spamc spamc  4096 Jan 25 10:38 BAYES
# ls -la /etc/mail/spamassassin/BAYES
drwx--  2 spamc spamc 4096 Jan 25 10:38 .
drwxr-xr-x  3 root  root  4096 Jan 24 16:45 ..
-rw---  1 spamc spamc 5154 Jan 25 10:38 bayes.mutex
-rw-rw-rw-  1 spamc spamc0 Jan 25 13:17 bayes_journal
-rw---  1 spamc spamc  154 May 27  2004 bayes_journal.orig
-rw---  1 spamc spamc 41598976 Jan 25 07:58 bayes_seen
-rw-rw-rw-  1 spamc spamc  5414912 Jan 25 07:58 bayes_toks


-- 
---
   (o_
(o_//\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+--+
| ENRICO MORELLI |  email: [EMAIL PROTECTED]   |
| * *   *   *|  phone: +39 055 4574269 |
|  University of Florence|  fax  : +39 055 4574253 |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY|
+--+


Re: Bayes journal problem

2006-01-25 Thread Matt Kettler
Enrico Morelli wrote:
 Dear all,

 I'm using spamassassin 3.1.0 without problems.
 Starting from today I see the following messages in the log files of my
 mail server:
 Jan 24 16:35:13 alpha spamd[8295]: partial write to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 118632),
 recovering. Jan 24 16:35:14 alpha spamd[8293]: partial write to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 111984),
 recovering. Jan 24 16:35:14 alpha spamd[8294]: partial write to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 117408),
 recovering. Jan 24 16:35:14 alpha spamd[8294]: cannot write to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal, aborting! Jan 24
 16:35:14 alpha spamd[8294]: Exiting subroutine via last
 at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
 line 1073. Jan 24 16:35:14 alpha last message repeated 2 times Jan 24
 16:35:14 alpha spamd[8294]: Exiting eval via last
 at /usr/lib/perl5/vendor_perl/5.8.6/Mail/SpamAssassin/BayesStore/DBM.pm
 line 1073. Jan 24 16:35:14 alpha spamd[8295]: partial write to Bayes
 journal /etc/mail/spamassassin/BAYES/bayes_journal (4040 of 118632),
 recovering. 

 What's happen?

   
SA tried to write a large block of data to disk and the OS only allowed
it to write 4040 bytes.

Possible causes:

99% chance of disk full or user quota exceeded.
1% chance of hard disk failure.

Check your system logs and df




Re: Bayes journal problem

2006-01-25 Thread Enrico Morelli
On Wed, 25 Jan 2006 10:06:13 -0500
Matt Kettler [EMAIL PROTECTED] wrote:

 Enrico Morelli wrote:
  Dear all,
 

 SA tried to write a large block of data to disk and the OS only
 allowed it to write 4040 bytes.
 
 Possible causes:
 
 99% chance of disk full or user quota exceeded.
 1% chance of hard disk failure.
 
 Check your system logs and df
 
 
Yeah!!! Solved. Thanks.
In effect the / filesystem was 100% full.

-- 
---
   (o_
(o_//\  Coltivate Linux che tanto Windows si pianta da solo.
(/)_   V_/_
+--+
| ENRICO MORELLI |  email: [EMAIL PROTECTED]   |
| * *   *   *|  phone: +39 055 4574269 |
|  University of Florence|  fax  : +39 055 4574253 |
|  CERM - via Sacconi, 6 -  50019 Sesto Fiorentino (FI) - ITALY|
+--+


Bayes journal options and SQL

2005-01-07 Thread Rosenbaum, Larry M.
Do the Bayes journal options (bayes_journal_max_size,
bayes_learn_to_journal) have any effect when you use MySQL as the Bayes
database?





Re: Bayes journal options and SQL

2005-01-07 Thread Michael Parker
On Fri, Jan 07, 2005 at 05:39:44PM -0500, Rosenbaum, Larry M. wrote:
 Do the Bayes journal options (bayes_journal_max_size,
 bayes_learn_to_journal) have any effect when you use MySQL as the Bayes
 database?

No.

Michael


pgp9V5vhmX9d8.pgp
Description: PGP signature


Journal?

2004-12-30 Thread MIKE YRABEDRA


For a while now (years) I have employed a single bayes db for my entire
server of users. It used bayes and kept three files; bayes_seen, bayes_toks
and bayes_journal. All of these files had current creation(modification)
dates so I know they were being used. Thing is, I never used the rebuild
feature to sync the journal, it just happened. Is that normal?

But on to my question...

I recently started allowing the use of user_prefs for individual users.
Everything is working , but only 2 files are being added to the user home
folders; bayes_toks and bayes_seen (no journal).

Is this something I should be concerned about?

The accounts are fairly light, not high traffic, is there even a need to use
journaling on the individual accounts?

Thanks for any help.


++
Mike Yrabedra (President)
323 Incorporated 
Our Sites:
MacDock.com
MacAgent.com
iTuneAgent.com
MacSurfShop.com
++
W: http://www.323inc.com/
P: 770.382.1195
F: 734.448.5164
E: [EMAIL PROTECTED]
I: ichatmacdock
++
Whatever you do, work at it with all your heart,
as working for the Lord, not for men.
~Colossians 3:23 {{{
++





[2.64] Bayes journal: gibberish entry found

2004-10-11 Thread Martin Schröder
This just appeared in the SA-logs:
--
Oct 11 17:10:06 hostname spamd[16864]: info: setuid to user succeeded 
Oct 11 17:10:06 hostname spamd[16864]: processing message [EMAIL PROTECTED] 
for user:531. 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
hdu4+AdBAHUFuOwHQQBQV+h8bQAAaOAHQQBX6HFtAACDxBCF23QuahnoozQAAFmFwHQiagTopzQA 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
AEBAUI2FJP7//1DoPDUAAFBX6EFtAACDxBTrFo2FJP7//1DoKEAAAFBX6CltAACDxAyF23QsaNgH 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
QQBX6BdtAABoOC9BAGgAL0EAaNAHQQCNhXz+//9Q/xW04UAAg8QY6yFoyAdBAFfo62wAAFlZaCwB 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
AACNhXz+//9QagD/FWTgQABofAdBAFfoymwAAI2FfP7//1BX6M4PAABWV+i2bAAAjUW8UFforGwA 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
AGhwB0EAV+ihbAAAg8Qo6wdqAVjDi2Xog038/4tN8GSJDQBfXlvJw6QnQAC5J0AAwCdAAMcn 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
QADOJ0AA1SdAANwnQADjJ0AA6idAANEsQADYLEAA3yxAAOYsQADtLEAA9CxAAPssQAACLUAACS1A 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
ABAtQAAXLUAAHi1AACUtQABVi+yD7DRWi3UIaPAQQQBWgCYA6BJsAAAPt0UMSFmD+AtZd2H/JIUz 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
N0AAaOgQQQDrS2jcEEEA60Ro1BBBAOs9aMwQQQDrNmjIEEEA6y9owBBBAOsoaLgQQQDrIWiwEEEA 
Oct 11 17:10:07 hostname spamd[16864]: Bayes journal: gibberish entry found: 
6xpopBBBAOsTaJwQQQDrDGiQEEEA6wVohBBBAFbop2sAAFlZD7dFEFt 1097498524 Relief 
Oct 11 17:10:11 hostname spamd[16864]: identified spam (10.4/5.0) for user:531 
in 5.4 seconds, 2419 bytes. 
--

Is this anything to worry about?

TIA
Martin
-- 
   Martin Schröder, [EMAIL PROTECTED]
 ArtCom GmbH, Lise-Meitner-Str 5, 28359 Bremen, Germany
  Voice +49 421 20419-44 / Fax +49 421 20419-10
http://www.artcom-gmbh.de


Re: SA 3.0-RC2 producing extremely large bayes journal files

2004-09-26 Thread Kai Schaetzl
Kai Schaetzl wrote on Sat, 25 Sep 2004 22:34:10 +0200:

 FWIW, the problem seems to have been RC2-specific


I spoke to soon, I just needed to wait another day. So it took 15 days to 
surface this time. I'm going to open a bug on this if I can't find it on 
Bugzilla.

Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: SA 3.0-RC2 producing extremely large bayes journal files

2004-09-25 Thread Kai Schaetzl
FWIW, the problem seems to have been RC2-specific. Didn't occur after it, 
now going from RC4 to RTM next week. Thanks for all the great work!


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: SA 3.0-RC2 producing extremely large bayes journal files

2004-09-14 Thread Kai Schaetzl
Daniel Quinlan wrote on 11 Sep 2004 13:55:08 -0700:

 If you haven't already, please file a bug.


No, I didn't file a bug yet. It happened twice during the testing of RC2 
on the RC2 machine. It didn't happen on the RC3 machine. I applied RC4 
three days ago to both machines and am still waiting for it to happen. I 
suppose it doesn't make much sense to file a bug for RC2 which may have 
been eradicated already.


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





SA 3.0-RC2 producing extremely large bayes journal files

2004-09-11 Thread Kai Schaetzl
For about a week I've been seeing SA time-outs in MailScanner (120 sec 
time-out) and on investigating it seems the reason are extremely large 
bayes journal files. I ran sa-learn -D --sync and that took quite long, 
about two minutes. As I understand SA should try to sync once a day? So, 
it seems that when the sync should happen it takes so long and then times 
out with MS. I then took a look at the bayes dir and found this:

-rw-rw-rw-1 spamdwww36 Sep 11 12:43 bayes.mutex
-rw-rw-rw-1 root www 12968 Sep 11 13:13 bayes_journal
-rw-rw-rw-1 root www  170591392 Sep 11 12:04 bayes_journal.old
-rw-rw-rw-1 spamdwww   2408448 Sep 11 11:47 bayes_seen
-rw-rw-rw-1 spamdwww  20951040 Sep 11 11:47 bayes_toks

There was a message received at 12:04, then the --sync apparently was 
about to happen, but never finished? There are only a few thousand 
messages arriving per day, many of them whitelisted. It's nearly 
impossible that the journal could grow so large. Not only that it should 
sync automatically after some time I forced a sync only a few days ago. 
Also, when this happens I find that a lot of swap space is allocated 
although there's still 100 MB or more of free RAM available and only a 
restart of MailScanner frees that up. (I sent a message about this to the 
MS list as well.) But the basic underlying problem seems to be this 
massive journal bloat.

This is SA 3.0-RC2 on Suse 9.0 with MailScanner 4.32.5 (I think). I have 
RC3 on an almost identical system and haven't seen the same there yet. 
Where there any changes after RC2 in that area, so testing of RC4 might 
prove useful? Also, could it be any of the Perl modules involved? If so, 
which should I check or upgrade?


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: SA 3.0-RC2 producing extremely large bayes journal files

2004-09-11 Thread Daniel Quinlan
Kai Schaetzl [EMAIL PROTECTED] writes:

 For about a week I've been seeing SA time-outs in MailScanner (120 sec 
 time-out) and on investigating it seems the reason are extremely large 
 bayes journal files. I ran sa-learn -D --sync and that took quite long, 
 about two minutes. As I understand SA should try to sync once a day? So, 

If you haven't already, please file a bug.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/


Re: Cannot write to journal and others

2004-09-07 Thread Raquel Rice
On Sun, 5 Sep 2004 13:41:15 -0500
John Fleming [EMAIL PROTECTED] wrote:

 Sep  5 13:23:56 Luke spamd[29971]: cannot write to
 /var/.spamassassin/bayes_journal, Bayes db update ignored

Check the permissions on the directory /var/.spamassassin  I believe
it should be readable/writeable by all mail processes ... by all
users.

-- 
Raquel

The person born with a talent they are meant to use will find their
greatest happiness in using it.
  --Johann Wolfgang Von Goethe