Re: Bayes db size....

2007-02-19 Thread Ken Menzel
- Original Message - 
From: "Dave Koontz" <[EMAIL PROTECTED]>

To: "'spam mailling list'" 
Sent: Saturday, February 17, 2007 9:30 AM
Subject: Re: Bayes db size



Is there a consensus on this need?  I deal with the seen db issue by
scheduled deletion of that file.  That said,  with SA becoming more 
and
more prominent all the time, I suspect the Average Joe will miss 
this
oddity until they wind up with a sluggish system, out of drive space 
or

other related issues.

I was mostly curious of the logic on NOT doing maintenance on the 
Seen

and AWL db files.  If there is a consensus this needs to occur, then
perhaps I can take the time to create a proper patch.  I just want 
to

make sure I am not missing something fundamental here

Michael Parker wrote:

Dave Koontz wrote:



I use the SQL interface and expire the bayes_seen like this.  I 
believe 6 months to be over conservative.  I added a lastupdate column 
as a timestamp.  In the perl DBM  I would recommend you use a 
technique such as this and update the timestamp in perl.  It converts 
nicely to SQL.


Here is my query for cleaning bayes_seen:

mysql -u$USER -p$PW -h$SERVER -e\
"DELETE FROM bayes_seen WHERE lastupdate <= DATE_SUB(SYSDATE(), 
INTERVAL 6 MONTH); " \

$DB

Hope this helps,
Ken 



Re: Bayes db size....

2007-02-17 Thread Dave Koontz
Is there a consensus on this need?  I deal with the seen db issue by
scheduled deletion of that file.  That said,  with SA becoming more and
more prominent all the time, I suspect the Average Joe will miss this
oddity until they wind up with a sluggish system, out of drive space or
other related issues.

I was mostly curious of the logic on NOT doing maintenance on the Seen
and AWL db files.  If there is a consensus this needs to occur, then
perhaps I can take the time to create a proper patch.  I just want to
make sure I am not missing something fundamental here

Michael Parker wrote:
> Dave Koontz wrote:
>   
>> I am sure this has been asked numerous times before, but what is the logic
>> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
>> have been removed from the DB there is little to no use for 'unlearning' any
>> associated messages.  Besides on a busy system, this seen file gets large
>> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
>>
>> 
>
> Patches welcome.
>
> Michael
>
>   
>



Re: Bayes db size....

2007-02-17 Thread Michael Parker
Dave Koontz wrote:
> I am sure this has been asked numerous times before, but what is the logic
> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
> have been removed from the DB there is little to no use for 'unlearning' any
> associated messages.  Besides on a busy system, this seen file gets large
> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
> 

Patches welcome.

Michael


> 
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 16, 2007 7:19 PM
> To: spam mailling list
> Subject: Re: Bayes db size
> 
> On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
>> So you're saying that right now seen isn't capped like tokens right?
> 
> seen has no max size nor expiry features.
> 
> --
> Randomly Selected Tagline:
> "Like any French restaurant in America, it was overpriced, noisy, moody,
> and would put you in mortal danger if you had an accident with anything
> larger than a croissant." - Unknown about the Renault LeCar
> 
> 



RE: Bayes db size....

2007-02-17 Thread Dave Koontz
I am sure this has been asked numerous times before, but what is the logic
in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
have been removed from the DB there is little to no use for 'unlearning' any
associated messages.  Besides on a busy system, this seen file gets large
very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.


-Original Message-
From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 16, 2007 7:19 PM
To: spam mailling list
Subject: Re: Bayes db size

On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
> So you're saying that right now seen isn't capped like tokens right?

seen has no max size nor expiry features.

--
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy, moody,
and would put you in mortal danger if you had an accident with anything
larger than a croissant." - Unknown about the Renault LeCar




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 06:45:51PM -0600, Robert Nicholson wrote:
> Well then I only care about tokens and not repeated emails can I  
> disable seen?

You can't disable it, but you can delete it, as previously stated.

-- 
Randomly Selected Tagline:
54% of all statistics are made up.  No, make that 82%...


pgpJeszJhPLwp.pgp
Description: PGP signature


Re: Bayes db size....

2007-02-16 Thread Robert Nicholson
Well then I only care about tokens and not repeated emails can I  
disable seen?


On Feb 16, 2007, at 6:19 PM, Theo Van Dinter wrote:


On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:

So you're saying that right now seen isn't capped like tokens right?


seen has no max size nor expiry features.

--
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy,  
moody,
 and would put you in mortal danger if you had an accident with  
anything

 larger than a croissant." - Unknown about the Renault LeCar




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
> So you're saying that right now seen isn't capped like tokens right?

seen has no max size nor expiry features.

-- 
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy, moody,
 and would put you in mortal danger if you had an accident with anything
 larger than a croissant." - Unknown about the Renault LeCar


pgpoU1aLK9mxe.pgp
Description: PGP signature


Re: Bayes db size....

2007-02-16 Thread Robert Nicholson

So you're saying that right now seen isn't capped like tokens right?

On Feb 16, 2007, at 5:45 PM, Theo Van Dinter wrote:


On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote:

Why then is my Bayes DB 20MEG in size right now if
=item bayes_expiry_max_db_size  (default: 15)


That's in number of tokens, not physical size in bytes.

100,000 tokens, whichever has a larger value.  150,000 tokens is  
roughly

equivalent to a 8Mb database file.


That's an estimate, but depends on your platforms, libraries, etc.


How do I control the size of the _seen file?


You can delete it if you want to.  You'll be able to release  
messages again,

but that may not be an issue for you.

--
Randomly Selected Tagline:
"Truly unencumbered by the engineering process."
 - Unknown about the Renault Dauphine




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote:
> Why then is my Bayes DB 20MEG in size right now if
> =item bayes_expiry_max_db_size  (default: 15)

That's in number of tokens, not physical size in bytes.

> 100,000 tokens, whichever has a larger value.  150,000 tokens is roughly
> equivalent to a 8Mb database file.

That's an estimate, but depends on your platforms, libraries, etc.

> How do I control the size of the _seen file?

You can delete it if you want to.  You'll be able to release messages again,
but that may not be an issue for you.

-- 
Randomly Selected Tagline:
"Truly unencumbered by the engineering process."
 - Unknown about the Renault Dauphine


pgp5XYTaI5E5C.pgp
Description: PGP signature