Re: why does SA without autolearn need bayes read-write?

2015-02-01 Thread RW
On Sat, 31 Jan 2015 16:46:28 +0100
Reindl Harald wrote:

> according to the documentation *it is* a bug:

That's just a wiki entry. 

 
> http://wiki.apache.org/spamassassin/SiteWideBayesSetup
> Please note this directory needs to be RWX for all users that 
> SpamAssassin will be executed as, or R-X if autolearning and
> automatic expiry are disabled



Re: why does SA without autolearn need bayes read-write?

2015-01-31 Thread Reindl Harald

according to the documentation *it is* a bug:

http://wiki.apache.org/spamassassin/SiteWideBayesSetup
Please note this directory needs to be RWX for all users that 
SpamAssassin will be executed as, or R-X if autolearning and automatic 
expiry are disabled


bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0

rsyslog.conf for now masks it:
:msg, contains, "bayes db update ignored: Read-only file system" stop

Am 29.01.2015 um 18:34 schrieb RW:

On Wed, 28 Jan 2015 15:58:56 +0100
Reindl Harald wrote:


* first:  it is a bug to write/lock when auto_expire / auto_learn is
off


As I said, it's not a bug. The updates are done in case you want to
expire later with  sa-learn --force-expire.

Auto-expiry means performing the expiry automatically when the database
goes over its configured token limit. Most people don't do this because
the expiry is then  done during a classification which can cause
a timeout.

Setting "auto_expire 0" is not a way of telling SA that you aren't going
to expire the database.

On Wed, 28 Jan 2015 01:03:37 +0100
Reindl Harald wrote:

...   even if we decide to kill spam-spamles older than x
months it needs to be done properly to the 50% spam / 50% ham
ratio which is the reason the bayes works that good


The ratio doesn't matter; it's a myth that it should be 50:50 or match
the ratio in your mail.

What's important is that you learn enough ham and enough spam, and that
the training is correct and sufficiently representative. It is
preferable that there isn't a big mismatch between the ham/spam ratio
in the corpus as a whole and in recently added mail as that can skew
the probabilities of new tokens.


compared with
autolearning setups where everyone i have seen in the past 8 years
became worser each month until classify most ham as spam and let
thorugh the real crap


It works for some, but when it fails it's not because the ratio of
spam to ham is wrong, it's because of a combination of mistraining,
inadequate ham and poor choices in what's learned




signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread RW
On Wed, 28 Jan 2015 15:58:56 +0100
Reindl Harald wrote:


> * first:  it is a bug to write/lock when auto_expire / auto_learn is
> off

As I said, it's not a bug. The updates are done in case you want to
expire later with  sa-learn --force-expire. 

Auto-expiry means performing the expiry automatically when the database
goes over its configured token limit. Most people don't do this because
the expiry is then  done during a classification which can cause
a timeout.

Setting "auto_expire 0" is not a way of telling SA that you aren't going
to expire the database.



On Wed, 28 Jan 2015 01:03:37 +0100
Reindl Harald wrote:

> ...   even if we decide to kill spam-spamles older than x
> months it needs to be done properly to the 50% spam / 50% ham
> ratio which is the reason the bayes works that good 

The ratio doesn't matter; it's a myth that it should be 50:50 or match
the ratio in your mail. 


What's important is that you learn enough ham and enough spam, and that
the training is correct and sufficiently representative. It is
preferable that there isn't a big mismatch between the ham/spam ratio
in the corpus as a whole and in recently added mail as that can skew
the probabilities of new tokens.


> compared with
> autolearning setups where everyone i have seen in the past 8 years
> became worser each month until classify most ham as spam and let
> thorugh the real crap

It works for some, but when it fails it's not because the ratio of
spam to ham is wrong, it's because of a combination of mistraining,
inadequate ham and poor choices in what's learned. 


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Reindl Harald

Am 29.01.2015 um 16:23 schrieb John Hardin:

On Thu, 29 Jan 2015, Reindl Harald wrote:

Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

 On 28.01.15 01:03, Reindl Harald wrote:
>  if understand you correctly we agree that there is no reason /var
>  can't be mounted read-only?

 I do not agree. The whole point of /var is to contain varying data and
 mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have
anything below /var writeable for a network facing service


"no reason /var can't be mounted read-only" does *not* suggest that


* the initial post makes it pretty clear
* it was even quoted by fantomas first reply on this thread
* i made that clear multiple times

 Weitergeleitete Nachricht ----
Betreff: Re: why does SA without autolearn need bayes read-write?
Datum: Wed, 28 Jan 2015 15:04:26 +0100
Von: Reindl Harald 
An: users@spamassassin.apache.org

no need for mount own partitions on recent linux systems
that's what namespaces are for and systemd has easy interfaces

 Weitergeleitete Nachricht ----
Betreff: Re: why does SA without autolearn need bayes read-write?
Datum: Tue, 27 Jan 2015 13:44:33 +0100
Von: Matus UHLAR - fantomas 
An: users@spamassassin.apache.org

On 27.01.15 03:01, Reindl Harald wrote:
> with "bayes_auto_learn 0" there is no reason to lock the bayes
> database and the spamd-service should be happy with
> "ReadOnlyDirectories=/var/lib"

the bayes databaase contains not only tokens, but also timestamps used 
for expiration. That's why you need to write to them.


 Weitergeleitete Nachricht 
Betreff: why does SA without autolearn need bayes read-write?
Datum: Tue, 27 Jan 2015 03:01:10 +0100
Von: Reindl Harald 
An: Mailing-List spamassassin 

IMHO that is a bug

with "bayes_auto_learn 0" there is no reason to lock the bayes database
and the spamd-service should be happy with "ReadOnlyDirectories=/var/lib"

training and sa-update is done on a shell independent of network aware
services

Jan 27 02:52:58 testserver spamd[2794]: bayes: cannot write to
/var/lib/spamass-milter/.spamassassin/bayes_journal, bayes db update
ignored: Read-only file system
Jan 27 02:52:58 testserver spamd[2794]: spamd: clean message (0.5/5.5)
for sa-milt:189 in 0.5 seconds, 804 bytes.
Jan 27 02:52:58 testserver spamd[2794]: spamd: result: . 0 -
ALL_TRUSTED,BAYES_50,T_RP_MATCHES_RCVD
scantime=0.5,size=804,user=sa-milt,uid=189,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=20782,mid=<54c6ef78.8090...@testserver.rhsoft.net>,bayes=0.40,autolearn=disabled



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread John Hardin

On Thu, 29 Jan 2015, Reindl Harald wrote:



Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

 On 28.01.15 01:03, Reindl Harald wrote:
>  if understand you correctly we agree that there is no reason /var
>  can't be mounted read-only?

 I do not agree. The whole point of /var is to contain varying data and
 mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have 
anything below /var writeable for a network facing service


"no reason /var can't be mounted read-only" does *not* suggest that.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Political Correctness is a doctrine which is based on the premise
  that it is possible, through nothing more than a suitable choice
  of words, to pick up a turd by the clean end.
---
 3 days until the 12th anniversary of the loss of STS-107 Columbia


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Reindl Harald


Am 29.01.2015 um 10:18 schrieb Matus UHLAR - fantomas:

On 28.01.15 01:03, Reindl Harald wrote:

if understand you correctly we agree that there is no reason /var
can't be mounted read-only?


I do not agree. The whole point of /var is to contain varying data and
mounting it read-only defeats the whole purpose of /var.


i am not talking about a own partition

i am talking about a *systemd namespace* and the intention *not* have 
anything below /var writeable for a network facing service


frankly - can we stop to dicuss left and right?

i asked for not touch bayes from the spamd service for good reasons, 
know the setup and there are well considered reasons why every piece is 
like it is - if it's not possible - can it made possible and is someone 
willing to implement it for money and how much money - that's it

__


I see following possibilities for you:
- move BAYES to a database of any kind


for sure not, the bayes is build with a script and rsynced to other 
machines which have to work *independent* from each other and so there 
is no point in setup a database with replication, failovers and a lot of 
time-invest when things can be simple



- set up SA to learn to journal, and use overlayfs for the journal
   (rememer to set bayes_journal_max_size big enough),
   droping it or syncing periodically


it is big enough

use_learner 1
use_bayes 1
use_bayes_rules 1
bayes_use_hapaxes 1
bayes_expiry_max_db_size 250
bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0
bayes_learn_to_journal 1

>> the intention of this *global bayes* is *not* to learn or expire
>> anything - the implemented "remove from bayes" method is just remove
>> the message from the corpus folder and type "sa-learn.sh rebuild"
>
> I believe it's much more effective to expire old tokens
> that are not appeating in mail than to purge old mail
> from DB, when you don't know if the tokens
> are still used or not.
>
> I'm afraid you got the expire issue wrong...

i got nothing wrong

i don't matter if tokens are not used for two months, 10 years 
expierience shows they re-appear sooner or later and i don't invest 
hundrets of work-hours to collect thousands of mail samples to have 
token expire automatically


the bayes works *perfectly* and frankly as started with SA a large part 
of the spam bayes was built by years old archive data




signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Matus UHLAR - fantomas

On 27.01.15 18:49, Reindl Harald wrote:
the intention of this *global bayes* is *not* to learn or expire 
anything - the implemented "remove from bayes" method is just remove 
the message from the corpus folder and type "sa-learn.sh rebuild"


I believe it's much more effective to expire old tokens that are not appeating
in mail than to purge old mail from DB, when you don't know if the tokens
are still used or not.

I'm afraid you got the expire issue wrong...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"One World. One Web. One Program." - Microsoft promotional advertisement
"Ein Volk, ein Reich, ein Fuhrer!" - Adolf Hitler


Re: why does SA without autolearn need bayes read-write?

2015-01-29 Thread Matus UHLAR - fantomas

On 28.01.15 01:03, Reindl Harald wrote:
if understand you correctly we agree that there is no reason /var 
can't be mounted read-only?


I do not agree. The whole point of /var is to contain varying data and
mounting it read-only defeats the whole purpose of /var.

I see following possibilities for you:
- move BAYES to a database of any kind
- set up SA to learn to journal, and use overlayfs for the journal
  (rememer to set bayes_journal_max_size big enough),
  droping it or syncing periodically

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Axb

On 01/28/2015 05:00 PM, Reindl Harald wrote:

BTW it is not visible which users are core-developers on this list and
which are not - until now i thought you are as example


I am part of the dev team or as you say "core-developers" which doesn't 
mean I have to be a Perl monger. There's other tasks...


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Reindl Harald


Am 28.01.2015 um 16:52 schrieb Axb:

On 01/28/2015 04:38 PM, Reindl Harald wrote:


is AFAIK relevant in context of sa-learn to not re-train the same
messages again and again - and it has it's own bugs becaue for a few
messages it contains random parts of the message itself, fire sa-learn
on the whole corpus would add these messages each time to "bayes_toks"

see two example snippets below
hence it is that large here

-rw--- 1 sa-milt sa-milt 5,4K 2015-01-28 16:34 bayes_journal
-rw--- 1 sa-milt sa-milt 1,3M 2015-01-28 16:12 bayes_seen
-rw--- 1 sa-milt sa-milt  40M 2015-01-28 16:33 bayes_toks
-rw--- 1 sa-milt sa-milt   98 2014-08-21 17:47 user_prefs
_


something here does NOT make sense

1.3 MB of seen against 40MB tokens.

someone please correct me if I'm wrong:

afaik, this probably means you've deleted bayes_seen so bayes has lost
it's record of what it has processed so it will relearn stuff you
already fed it.


no, i explained what happens in the part you stripped from the quote - 
it contains randomly complete message parts independent how often i 
delete *any file* in the userhome and rebuild from scratch


if i delete "bayes_seen" than it happens by a complete reset with 
sa-learn.sh using sa-learn to *rebuild from scratch* based on the 
forever stored raw-mails in the folders "ham" and "spam"




signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Reindl Harald



Am 28.01.2015 um 16:39 schrieb Axb:

On 01/28/2015 03:58 PM, Reindl Harald wrote:

* third:  if you would be a smart upstream in case of a company admin
   asking for a change instead "write a patch" you could make
   a offer talking about money to include the change in the
   next upstream version - we sponsored changes and maintainance
   of projects like DBMail, Netatalk and others multiple times
   in the last years - just because instead pertly responses a
   friendly "i am not that much interested but i guess the amount
   of time will be xx hours for xx € per hour and so i am open"


I'm not a usable Perl programmer either but I put my cash where my mouth
is and have $pon$ored several major features in SA.


so would i sponsor things if it's worth


Instead of complaining, ranting and/or being frustrated, it's way more
productive to open a feature request in bugzilla and sweet talk one of
the main devs to add your enhancement (if they consider it worthy) or
get someone to code it for you so you can use it in your deployment/SA
fork. I promise you, it works.


the ranting and beeing frustrated comes from the hostile manner on that 
list as repsonse to *every question* starting with the reply to my first 
post last year in the style "go away, we don't care about milter" up to 
"you are outright lying" until i proved that my observations are right


frankly the intention of writing first a mail to a mailing-list before 
make a feature request is to find out if theres a way to change behavior 
via configuration and all the "creep away"-style responses don't do 
anything good for starting "sweet talk" nor do they give the feeling any 
feature request is welcome at all


another reason for first writing a mail to a list is my own developer 
expierience where users hundrets of times asked for things and i was 
able to change some behavior regression free while the user is still on 
the phone - not every change needs bureaucracy


BTW it is not visible which users are core-developers on this list and 
which are not - until now i thought you are as example





signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Axb

On 01/28/2015 04:38 PM, Reindl Harald wrote:


is AFAIK relevant in context of sa-learn to not re-train the same
messages again and again - and it has it's own bugs becaue for a few
messages it contains random parts of the message itself, fire sa-learn
on the whole corpus would add these messages each time to "bayes_toks"

see two example snippets below
hence it is that large here

-rw--- 1 sa-milt sa-milt 5,4K 2015-01-28 16:34 bayes_journal
-rw--- 1 sa-milt sa-milt 1,3M 2015-01-28 16:12 bayes_seen
-rw--- 1 sa-milt sa-milt  40M 2015-01-28 16:33 bayes_toks
-rw--- 1 sa-milt sa-milt   98 2014-08-21 17:47 user_prefs
_


something here does NOT make sense

1.3 MB of seen against 40MB tokens.

someone please correct me if I'm wrong:

afaik, this probably means you've deleted bayes_seen so bayes has lost 
it's record of what it has processed so it will relearn stuff you 
already fed it.


Also, a 40MB tokens DB file will not exactly help your speed.

if you don't want to use Redis then at least use SDBM which is way faster.

local.cf:

bayes_store_module   Mail::SpamAssassin::BayesStore::SDBM

and restore/relearn your corpus






Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Axb

On 01/28/2015 03:58 PM, Reindl Harald wrote:

* third:  if you would be a smart upstream in case of a company admin
   asking for a change instead "write a patch" you could make
   a offer talking about money to include the change in the
   next upstream version - we sponsored changes and maintainance
   of projects like DBMail, Netatalk and others multiple times
   in the last years - just because instead pertly responses a
   friendly "i am not that much interested but i guess the amount
   of time will be xx hours for xx € per hour and so i am open"


I'm not a usable Perl programmer either but I put my cash where my mouth 
is and have $pon$ored several major features in SA.


Instead of complaining, ranting and/or being frustrated, it's way more 
productive to open a feature request in bugzilla and sweet talk one of 
the main devs to add your enhancement (if they consider it worthy) or 
get someone to code it for you so you can use it in your deployment/SA 
fork. I promise you, it works.


EOT





Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Reindl Harald



Am 28.01.2015 um 16:24 schrieb Axb:

On 01/28/2015 03:58 PM, Reindl Harald wrote:


* first:  it is a bug to write/lock when auto_expire / auto_learn is off


bayes_seen


is AFAIK relevant in context of sa-learn to not re-train the same 
messages again and again - and it has it's own bugs becaue for a few 
messages it contains random parts of the message itself, fire sa-learn 
on the whole corpus would add these messages each time to "bayes_toks"


see two example snippets below
hence it is that large here

-rw--- 1 sa-milt sa-milt 5,4K 2015-01-28 16:34 bayes_journal
-rw--- 1 sa-milt sa-milt 1,3M 2015-01-28 16:12 bayes_seen
-rw--- 1 sa-milt sa-milt  40M 2015-01-28 16:33 bayes_toks
-rw--- 1 sa-milt sa-milt   98 2014-08-21 17:47 user_prefs
_

^G^H^G^F^F<9A>^F<98>^Fb^F`^F*^F(^F^E^E^E^E<82>^E<80>^EJ^EH^E^R^E^P^E^D^D^D^Dj^Dh^D2^D0^D^C^C^C^C<8A>^C<88>^CR^CP^C^Z^C^X^CCORATION: 
underline

=7D



bgColor=3D=23ff>



  

  color=3D=23=

ff>For
  Immediate Release
  

  color^Ah^Afe6ea55025493eb288d63d54543b277d5d112c74@sa_generated^Ah^Af9b8d0a0253cba315ff4852870be8fc1bad03318@sa_generated^Ah^Af89aef32ae61c7084c4043b1234f13c1e0da74c1

@sa_generated^Ah^Af6cf9fe43d4279181f91b88d2be31914290f664f@sa_generated^Ah^Aed8cc17c1c67d46bbb3ad34dd8cba7d4daa80249@sa_generated^Ah^Adeeec278351d9105bd116971465e502cc35becbc@sa_generated^Ah^Ad9d0a11654680fe56d3
7f10f4fcc4b7205e768a9@sa_generated^Ah^
_

 iletilmesi gereken bir duyuru, haber ya da kampanya s�z konusu oldu�unda
tasar�m, bask�, ajans,



 arama, da��t�m vb. zaman kay�plar� ya�amadan do�rudan hedef kitlenize
ula�t�rabilirsiniz.





Pratiktir : Mesaj�n
t�keticiye ula�mas� neredeyse kesindir. Ula�t���nda ya da iletildi�inde 
rapor

al�nabilir.





Etklidir : G�nderdi�iniz
toplu mesajlar ile ilgili an�nda geri d�n�� alabilirsiniz
_
Toplu mesajla�ma di�er reklam ara�lar�na g�re olduk�a ekonomiktir.



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Axb

On 01/28/2015 03:58 PM, Reindl Harald wrote:


* first:  it is a bug to write/lock when auto_expire / auto_learn is off


bayes_seen


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Martin Gregorie
On Wed, 2015-01-28 at 15:04 +0100, Reindl Harald wrote:
> no need for mount own partitions on recent linux systems
> that's what namespaces are for and systemd has easy interfaces
> 
Fair enough: I thought you were talking about some sort of site-wide
read-only mount, but using systemd to limit the read-only access to SA
is nice.


Martin





Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Reindl Harald


Am 28.01.2015 um 15:46 schrieb Axb:

On 01/28/2015 03:18 PM, Kevin A. McGrail wrote:

On 1/28/2015 9:04 AM, Reindl Harald wrote:

my main point is that i don't want the locking IO when nothing then
the self developed maintainance scripts for the bayes has a business
to write anything there - it should be only read and in the best case
from each spamc-forker only opened once in his lifetime for best
performance

A) I have a feeling using Redis will provide the fastest performance
either way...


afaik, Redis requires "bayes_auto_expire  1" but one can set a huge TTL
for "bayes_token_ttl" &  "bayes_seen_ttl"

Of course, Redis also cause I/O when it dumps to disk but in all the SA
noise

I don't understand why Reindl is so scared of the Bayes file based I/O


i am scared about the read-only-fs warnings cluttering the logs where 
there is no business to write anything



Using modern hardware, the DB file type is slower than any I/O but
then...  Lets assume he's scared of speed coz he does scans during the
smtp sessions  AND he's using the default DB backend instead of the
faster SDBM (or Redis) :)


i avoid additional complexity and dependencies for damned good reasons 
and the last time i did not so in case of prosody (jabber server) and 
used sqlite instead plaintext defaults it took me a lot of wasted time 
after a distr-upgrade



but then, he'll supply the patch. BAZINGA!


what a hostile reaction to reports

* first:  it is a bug to write/lock when auto_expire / auto_learn is off
* second: i am not a perl developer
* third:  if you would be a smart upstream in case of a company admin
  asking for a change instead "write a patch" you could make
  a offer talking about money to include the change in the
  next upstream version - we sponsored changes and maintainance
  of projects like DBMail, Netatalk and others multiple times
  in the last years - just because instead pertly responses a
  friendly "i am not that much interested but i guess the amount
  of time will be xx hours for xx € per hour and so i am open"

point 3 is BTW the reason why DBMail 3.x still has the native autoreply 
feature - so you developers should consider acting somehow less hostile 
and more smart in context of user-requests and make even money with it




signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Axb

On 01/28/2015 03:18 PM, Kevin A. McGrail wrote:

On 1/28/2015 9:04 AM, Reindl Harald wrote:

my main point is that i don't want the locking IO when nothing then
the self developed maintainance scripts for the bayes has a business
to write anything there - it should be only read and in the best case
from each spamc-forker only opened once in his lifetime for best
performance

A) I have a feeling using Redis will provide the fastest performance
either way...



afaik, Redis requires "bayes_auto_expire  1" but one can set a huge TTL 
for "bayes_token_ttl" &  "bayes_seen_ttl"


Of course, Redis also cause I/O when it dumps to disk but in all the SA 
noise


I don't understand why Reindl is so scared of the Bayes file based I/O.
Using modern hardware, the DB file type is slower than any I/O but 
then...  Lets assume he's scared of speed coz he does scans during the 
smtp sessions  AND he's using the default DB backend instead of the 
faster SDBM (or Redis) :)


but then, he'll supply the patch. BAZINGA!


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Kevin A. McGrail

On 1/28/2015 9:04 AM, Reindl Harald wrote:
my main point is that i don't want the locking IO when nothing then 
the self developed maintainance scripts for the bayes has a business 
to write anything there - it should be only read and in the best case 
from each spamc-forker only opened once in his lifetime for best 
performance
A) I have a feeling using Redis will provide the fastest performance 
either way...

B) Feel free to submit a patch for the feature request

Regards,
KAM


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Reindl Harald


Am 28.01.2015 um 12:11 schrieb Martin Gregorie:

On Tue, 2015-01-27 at 16:40 -0800, John Hardin wrote:

On Wed, 28 Jan 2015, Reindl Harald wrote:


if understand you correctly we agree that there is no reason /var can't be
mounted read-only?


Other than the historical practice that /var is intended to contain
varying data, and that implies read/write...


Years ago I moved my Apache and my PostgreSQL installations from /var
to /home. Both are happy in their new location, so I can't see why the
same trick wouldn't work equally well for MySQL. Pick any place you
want, e.g. its own partition, then you can mount it read-only and know
you can't upset anything else by accident.

I suspect that HR has done exactly that and symlinked the read-only
partition into /var, which is another way to achieving the same end. The
main reasons I moved Apache and PostgreSQL to /home was so I could back
them up more easily and because /home has its own partition to make
Fedora reinstalls/upgrades easier


no need for mount own partitions on recent linux systems
that's what namespaces are for and systemd has easy interfaces

my main point is that i don't want the locking IO when nothing then the 
self developed maintainance scripts for the bayes has a business to 
write anything there - it should be only read and in the best case from 
each spamc-forker only opened once in his lifetime for best performance


[root@testserver:~]$ cat /etc/systemd/system/spamassassin.service
[Unit]
Description=Spamassassin Daemon
After=network.service systemd-networkd.service network-online.target
Before=postfix.service

[Service]
Environment="TMPDIR=/tmp"
PermissionsStartOnly=true
ExecStartPre=/usr/bin/find /var/lib/spamassassin/ -type d -exec 
/bin/chmod 0755 "{}" \;
ExecStartPre=/usr/bin/find /var/lib/spamassassin/ -type f -exec 
/bin/chmod 0644 "{}" \;
ExecStart=/usr/bin/spamd -c -H --max-children=10 --min-children=1 
--min-spare=1 --max-spare=3 --port=10028

ExecReload=/usr/bin/kill -HUP $MAINPID
Environment="LANG=en_GB.UTF-8"
User=sa-milt
Group=sa-milt
Nice=15
StandardOutput=null
StandardError=null
SyslogFacility=mail
Restart=always
RestartSec=1

PrivateTmp=yes
PrivateDevices=yes
NoNewPrivileges=yes
CapabilityBoundingSet=~CAP_AUDIT_CONTROL CAP_AUDIT_WRITE CAP_NET_ADMIN 
CAP_NET_BIND_SERVICE CAP_SYS_ADMIN CAP_SYS_BOOT CAP_SYS_MODULE 
CAP_SYS_PTRACE


ReadOnlyDirectories=/etc
ReadOnlyDirectories=/usr
ReadOnlyDirectories=/var/lib 



InaccessibleDirectories=-/var/lib/spamassassin-milter/training

InaccessibleDirectories=-/boot
InaccessibleDirectories=-/home
InaccessibleDirectories=-/media
InaccessibleDirectories=-/root
InaccessibleDirectories=-/etc/dbus-1
InaccessibleDirectories=-/etc/modprobe.d
InaccessibleDirectories=-/etc/modules-load.d
InaccessibleDirectories=-/etc/postfix
InaccessibleDirectories=-/etc/ssh
InaccessibleDirectories=-/etc/sysctl.d
InaccessibleDirectories=-/run/console
InaccessibleDirectories=-/run/dbus
InaccessibleDirectories=-/run/lock
InaccessibleDirectories=-/run/mount
InaccessibleDirectories=-/run/systemd/generator
InaccessibleDirectories=-/run/systemd/system
InaccessibleDirectories=-/run/systemd/users
InaccessibleDirectories=-/run/udev
InaccessibleDirectories=-/run/user
InaccessibleDirectories=-/usr/lib64/dbus-1
InaccessibleDirectories=-/usr/lib64/xtables
InaccessibleDirectories=-/usr/lib/dracut
InaccessibleDirectories=-/usr/libexec/iptables
InaccessibleDirectories=-/usr/libexec/openssh
InaccessibleDirectories=-/usr/libexec/postfix
InaccessibleDirectories=-/usr/lib/grub
InaccessibleDirectories=-/usr/lib/kernel
InaccessibleDirectories=-/usr/lib/modprobe.d
InaccessibleDirectories=-/usr/lib/modules
InaccessibleDirectories=-/usr/lib/modules-load.d
InaccessibleDirectories=-/usr/lib/rpm
InaccessibleDirectories=-/usr/lib/sysctl.d
InaccessibleDirectories=-/usr/lib/udev
InaccessibleDirectories=-/usr/local/scripts
InaccessibleDirectories=-/var/db
InaccessibleDirectories=-/var/lib/dbus
InaccessibleDirectories=-/var/lib/dnf
InaccessibleDirectories=-/var/lib/rpm
InaccessibleDirectories=-/var/lib/systemd
InaccessibleDirectories=-/var/lib/yum
InaccessibleDirectories=-/var/spool



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-28 Thread Martin Gregorie
On Tue, 2015-01-27 at 16:40 -0800, John Hardin wrote:
> On Wed, 28 Jan 2015, Reindl Harald wrote:
> 
> > if understand you correctly we agree that there is no reason /var can't be 
> > mounted read-only?
> 
> Other than the historical practice that /var is intended to contain 
> varying data, and that implies read/write...
> 
Years ago I moved my Apache and my PostgreSQL installations from /var
to /home. Both are happy in their new location, so I can't see why the
same trick wouldn't work equally well for MySQL. Pick any place you
want, e.g. its own partition, then you can mount it read-only and know
you can't upset anything else by accident.

I suspect that HR has done exactly that and symlinked the read-only
partition into /var, which is another way to achieving the same end. The
main reasons I moved Apache and PostgreSQL to /home was so I could back
them up more easily and because /home has its own partition to make
Fedora reinstalls/upgrades easier.


Martin







Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread John Hardin

On Wed, 28 Jan 2015, Reindl Harald wrote:


 Setting bayes_auto_expire 0 doesn't imply the database is not going to
 expired. The recommended way to expire is to turn-off auto-expiry and
 expire from cron.


don't understand that completly

* bayes_auto_expire 0
* which cronjob would expire


The one you write to run the expiry. No such cron job is provided with 
base SA by default, though it's possible distro packagers may add one.



* hopefully not sa-update


Nope.


 That said, it's not really essential to have atime updates. Without
 them the tokens would still have reasonably sensible timestamps
 derived from the received  headers of the  mail used in training. It
 wouldn't break expiry if they could be turned-off


if understand you correctly we agree that there is no reason /var can't be 
mounted read-only?


Other than the historical practice that /var is intended to contain 
varying data, and that implies read/write...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The tree of freedom must be freshened from time to time
  with the blood of tyrants and tyrannosaurs.
 -- DW, commenting on the GM6 Lynx .50BMG bullpup
---
 Today: Wolfgang Amadeus Mozart's 259th Birthday


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Reindl Harald


Am 28.01.2015 um 00:55 schrieb RW:

On Tue, 27 Jan 2015 18:49:23 +0100
Reindl Harald wrote:


Am 27.01.2015 um 17:28 schrieb Matus UHLAR - fantomas:



nobody expires or updates anything in a hand-maintained bayes


the one you might use, but not without timestamps


the intention of this *global bayes* is *not* to learn or expire
anything - the implemented "remove from bayes" method is just remove
the message from the corpus folder and type "sa-learn.sh rebuild"

when i say "*_auto_*" then i mean that and hence the desired result
is not write anything, just don't touch the bayes-db in normal
operations and don't waste disk IO

bayes_auto_expire 0


Setting bayes_auto_expire 0 doesn't imply the database is not going to
expired. The recommended way to expire is to turn-off auto-expiry and
expire from cron.


don't understand that completly

* bayes_auto_expire 0
* which cronjob would expire
* hopefully not sa-update

i really mean it serious that this bayes has no record which has to 
expire at any point of time until i say so for the specific message


that's just because it was a hard work to get around 16 train 
messages over 5 onths, most if not all are unliekly to become obsolete 
and even if we decide to kill spam-spamles older than x months it needs 
to be done properly to keep the 50% spam / 50% ham ratio which is the
reason the bayes works that good compared with autolearning setups where 
everyone i have seen in the past 8 years became worser each month until 
classify most ham as spam and let thorugh the real crap



That said, it's not really essential to have atime updates. Without
them the tokens would still have reasonably sensible timestamps
derived from the received  headers of the  mail used in training. It
wouldn't break expiry if they could be turned-off


if understand you correctly we agree that there is no reason /var can't 
be mounted read-only?






signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread RW
On Tue, 27 Jan 2015 18:49:23 +0100
Reindl Harald wrote:

> 
> 
> Am 27.01.2015 um 17:28 schrieb Matus UHLAR - fantomas:

> >> nobody expires or updates anything in a hand-maintained bayes
> >
> > the one you might use, but not without timestamps
> 
> the intention of this *global bayes* is *not* to learn or expire 
> anything - the implemented "remove from bayes" method is just remove
> the message from the corpus folder and type "sa-learn.sh rebuild"
> 
> when i say "*_auto_*" then i mean that and hence the desired result
> is not write anything, just don't touch the bayes-db in normal
> operations and don't waste disk IO
> 
> bayes_auto_expire 0


Setting bayes_auto_expire 0 doesn't imply the database is not going to
expired. The recommended way to expire is to turn-off auto-expiry and
expire from cron.

That said, it's not really essential to have atime updates. Without
them the tokens would still have reasonably sensible timestamps
derived from the received  headers of the  mail used in training. It
wouldn't break expiry if they could be turned-off.






Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Reindl Harald



Am 27.01.2015 um 17:28 schrieb Matus UHLAR - fantomas:

Am 27.01.2015 um 13:44 schrieb Matus UHLAR - fantomas:

On 27.01.15 03:01, Reindl Harald wrote:

with "bayes_auto_learn 0" there is no reason to lock the bayes
database and the spamd-service should be happy with
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps
used for
expiration. That's why you need to write to them


On 27.01.15 14:23, Reindl Harald wrote:

which expiration?

nobody expires or updates anything in a hand-maintained bayes


the one you might use, but not without timestamps


the intention of this *global bayes* is *not* to learn or expire 
anything - the implemented "remove from bayes" method is just remove the 
message from the corpus folder and type "sa-learn.sh rebuild"


when i say "*_auto_*" then i mean that and hence the desired result is 
not write anything, just don't touch the bayes-db in normal operations 
and don't waste disk IO


bayes_auto_expire 0
bayes_auto_learn 0
_

use_learner 1
use_bayes 1
use_bayes_rules 1
bayes_use_hapaxes 1
bayes_expiry_max_db_size 250
bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0
bayes_learn_to_journal 1



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Matus UHLAR - fantomas

Am 27.01.2015 um 13:44 schrieb Matus UHLAR - fantomas:

On 27.01.15 03:01, Reindl Harald wrote:

with "bayes_auto_learn 0" there is no reason to lock the bayes
database and the spamd-service should be happy with
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used for
expiration. That's why you need to write to them


On 27.01.15 14:23, Reindl Harald wrote:

which expiration?

nobody expires or updates anything in a hand-maintained bayes


the one you might use, but not without timestamps.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Reindl Harald


Am 27.01.2015 um 14:33 schrieb Axb:

On 01/27/2015 02:23 PM, Reindl Harald wrote:

Am 27.01.2015 um 13:44 schrieb Matus UHLAR - fantomas:

On 27.01.15 03:01, Reindl Harald wrote:

with "bayes_auto_learn 0" there is no reason to lock the bayes
database and the spamd-service should be happy with
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used
for
expiration. That's why you need to write to them


which expiration?

nobody expires or updates anything in a hand-maintained bayes

use_bayes 1
bayes_auto_expire 0
bayes_auto_learn 0



would this help?

use_learner 0


no, it leads in not use bayes at all

result: . -2 - ALL_TRUSTED scantime=0.1
result: . 0 - ALL_TRUSTED,BAYES_50 scantime=0.5
_

not further tested if it also has an impact to sa-learn on the shell too 
- IMHO when "bayes_auto_expire", "bayes_auto_learn" and 
"bayes_learn_during_report" are all 0 there should be no locking


not only because the permissions, also because of wasted disk-IO
_

that leads in bayes not used at all
"use_learner 0" was the only difference to production

use_learner 0
use_bayes 1
use_bayes_rules 1
bayes_use_hapaxes 1
bayes_expiry_max_db_size 250
bayes_auto_expire 0
bayes_auto_learn 0
bayes_learn_during_report 0
bayes_learn_to_journal 1



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread jpff



On Tue, 27 Jan 2015, Reindl Harald wrote:



nobody expires or updates anything in a hand-maintained bayes



Just amessage from nobody (important) apparently
==John ff



Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Axb

On 01/27/2015 02:23 PM, Reindl Harald wrote:


Am 27.01.2015 um 13:44 schrieb Matus UHLAR - fantomas:

On 27.01.15 03:01, Reindl Harald wrote:

with "bayes_auto_learn 0" there is no reason to lock the bayes
database and the spamd-service should be happy with
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used
for
expiration. That's why you need to write to them


which expiration?

nobody expires or updates anything in a hand-maintained bayes

use_bayes 1
bayes_auto_expire 0
bayes_auto_learn 0



would this help?

use_learner 0



Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Reindl Harald


Am 27.01.2015 um 13:44 schrieb Matus UHLAR - fantomas:

On 27.01.15 03:01, Reindl Harald wrote:

with "bayes_auto_learn 0" there is no reason to lock the bayes
database and the spamd-service should be happy with
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used for
expiration. That's why you need to write to them


which expiration?

nobody expires or updates anything in a hand-maintained bayes

use_bayes 1
bayes_auto_expire 0
bayes_auto_learn 0



signature.asc
Description: OpenPGP digital signature


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Benny Pedersen

Matus UHLAR - fantomas skrev den 2015-01-27 13:44:

On 27.01.15 03:01, Reindl Harald wrote:
with "bayes_auto_learn 0" there is no reason to lock the bayes 
database and the spamd-service should be happy with 
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used 
for

expiration. That's why you need to write to them.


extension in spamassassin need to support seperate DBI:foo for 
WRITEONLY, and another for DBI:bar READONLY, sqlgrey can do it, but 
spamassasin not yet


this can be extended to all DBI: databases used in spamassassin, should 
be fairly simple to make that work


Re: why does SA without autolearn need bayes read-write?

2015-01-27 Thread Matus UHLAR - fantomas

On 27.01.15 03:01, Reindl Harald wrote:
with "bayes_auto_learn 0" there is no reason to lock the bayes 
database and the spamd-service should be happy with 
"ReadOnlyDirectories=/var/lib"


the bayes databaase contains not only tokens, but also timestamps used for
expiration. That's why you need to write to them.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #9: Out of error messages.