Re: sa-learn --ham ground rules

2008-02-14 Thread Michelle Konzack
Am 2008-02-13 10:04:36, schrieb Matus UHLAR - fantomas:
> you can just provide te directory name. sa-learn will then scan the
> directory w/o args limit

Sory, but I use "--dir" since ages and if I have over 1200-1400 messages
sa-learn exit with an error message that I have exceed the limits...

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: sa-learn --ham ground rules

2008-02-14 Thread Michelle Konzack
Am 2008-02-13 05:14:38, schrieb Karsten Bräckelmann:
> On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote:
> > Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > > So what is the maximum number of files in a directory that one can feed 
> > > > to 
> > > > sa-learn --ham and expect it to achieve normal speed?
> > > 
> > > Dunno if there are limitations -- however, your 7k messages should be
> > > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > > was no noticeable difference to a 30 messages test.
> > 
> > Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> > YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> > is slidely different and he seems to exceed the ARGS limit.
> 
> Nope.  The command fragment Gene cared to show us did *not* have any
> wildcard, but a dir. No bash filename expansion, no limit exceeded.

Right, bu if I run 'sa-learn --spam --dir ...' sa-learn exit with
an error message that I have tried to scan to many messages (something
similar)  which mean, perl had exceed the limits.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: sa-learn --ham ground rules

2008-02-13 Thread Matus UHLAR - fantomas
> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > So what is the maximum number of files in a directory that one can feed 
> > > to 
> > > sa-learn --ham and expect it to achieve normal speed?
> > 
> > Dunno if there are limitations -- however, your 7k messages should be
> > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > was no noticeable difference to a 30 messages test.

On 10.02.08 21:34, Michelle Konzack wrote:
> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> is slidely different and he seems to exceed the ARGS limit.

you can just provide te directory name. sa-learn will then scan the
directory w/o args limit
-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
M$ Win's are shit, do not use it !


Re: sa-learn --ham ground rules

2008-02-13 Thread Matus UHLAR - fantomas
> Am 2008-02-08 01:49:52, schrieb Gene Heskett:
> > So what is the maximum number of files in a directory that one can feed to 
> > sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
> > feeding it my corpus of another folder it was having trouble with a year 
> > ago, 
> > the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
> > that time.

On 10.02.08 21:31, Michelle Konzack wrote:
> Many programs including "rm", "mv", "ls" and "sa-learn" have a limit in
> the commandline options which is arround 1200 to 1400.

it's not limit of those programs, it's system limit of how much data can be
passed in arguments. And they don't exit, it's the shell who fails to
execute such process 

> Note:  "sa-learn" exits automaticaly if you feed to many messages to it.

no. sa-learn afaik doesn't count that.
-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


Re: sa-learn --ham ground rules

2008-02-12 Thread Gene Heskett
On Tuesday 12 February 2008, John Hardin wrote:
>On Tue, 12 Feb 2008, Gene Heskett wrote:
>> On Sunday 10 February 2008, Michelle Konzack wrote:
>>> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
>>>>> So what is the maximum number of files in a directory that one can feed
>>>>> to sa-learn --ham and expect it to achieve normal speed?
>>>>
>>>> Dunno if there are limitations -- however, your 7k messages should be
>>>> perfectly fine. Just ran a test on a 6k messages mbox file, and there
>>>> was no noticeable difference to a 30 messages test.
>>>
>>> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
>>> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
>>> is slidely different and he seems to exceed the ARGS limit.
>>>
>>> Thanks, Greetings and nice Day
>>>Michelle Konzack
>>>Systemadministrator
>>>Tamay Dogan Network
>>>Debian GNU/Linux Consultant
>>
>> Guilty, its all in Mail dir format.
>
>xargs then?

Looks interesting if I can grok how to use it.  Another day though, my plate 
runneth over till the weekend now.  Thanks John.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
What an author likes to write most is his signature on the back of a cheque.
-- Brendan Francis


Re: sa-learn --ham ground rules

2008-02-12 Thread Karsten Bräckelmann
On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote:
> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > So what is the maximum number of files in a directory that one can feed 
> > > to 
> > > sa-learn --ham and expect it to achieve normal speed?
> > 
> > Dunno if there are limitations -- however, your 7k messages should be
> > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > was no noticeable difference to a 30 messages test.
> 
> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> is slidely different and he seems to exceed the ARGS limit.

Nope.  The command fragment Gene cared to show us did *not* have any
wildcard, but a dir. No bash filename expansion, no limit exceeded.

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: sa-learn --ham ground rules

2008-02-12 Thread Theo Van Dinter
On Tue, Feb 12, 2008 at 04:04:28PM -0500, Gene Heskett wrote:
> Guilty, its all in Mail dir format.

--dir ?

-- 
Randomly Selected Tagline:
"There are all of these warnings and incantations and unnatural rituals
 and everything's veiled in this threat of "you mess with the mayo,
 the mayo mess with you, man."   - Alton Brown, Good Eats, "Mayo Clinc"


pgpjmc9m78Nyz.pgp
Description: PGP signature


Re: sa-learn --ham ground rules

2008-02-12 Thread John Hardin

On Tue, 12 Feb 2008, Gene Heskett wrote:


On Sunday 10 February 2008, Michelle Konzack wrote:

Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:

So what is the maximum number of files in a directory that one can feed
to sa-learn --ham and expect it to achieve normal speed?


Dunno if there are limitations -- however, your 7k messages should be
perfectly fine. Just ran a test on a 6k messages mbox file, and there
was no noticeable difference to a 30 messages test.


Yeah, you can even feed 200.000 spams from the bebian lists to it IF
YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
is slidely different and he seems to exceed the ARGS limit.

Thanks, Greetings and nice Day
   Michelle Konzack
   Systemadministrator
   Tamay Dogan Network
   Debian GNU/Linux Consultant


Guilty, its all in Mail dir format.


xargs then?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...in the 2nd amendment the right to arms clause means you have
  the right to choose how many arms you want, and the militia clause
  means that Congress can punish you if the answer is "none."
-- David Hardy, 2nd Amendment scholar
---
 Today: Abraham Lincoln's and Charles Darwin's 199th Birthdays

Re: sa-learn --ham ground rules

2008-02-12 Thread Gene Heskett
On Sunday 10 February 2008, Michelle Konzack wrote:
>Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
>> > So what is the maximum number of files in a directory that one can feed
>> > to sa-learn --ham and expect it to achieve normal speed?
>>
>> Dunno if there are limitations -- however, your 7k messages should be
>> perfectly fine. Just ran a test on a 6k messages mbox file, and there
>> was no noticeable difference to a 30 messages test.
>
>Yeah, you can even feed 200.000 spams from the bebian lists to it IF
>YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
>is slidely different and he seems to exceed the ARGS limit.
>
>Thanks, Greetings and nice Day
>Michelle Konzack
>Systemadministrator
>Tamay Dogan Network
>Debian GNU/Linux Consultant

Guilty, its all in Mail dir format.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
new, adj.:
Different color from previous model.


Re: sa-learn --ham ground rules

2008-02-12 Thread Michelle Konzack
Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > So what is the maximum number of files in a directory that one can feed to 
> > sa-learn --ham and expect it to achieve normal speed?
> 
> Dunno if there are limitations -- however, your 7k messages should be
> perfectly fine. Just ran a test on a 6k messages mbox file, and there
> was no noticeable difference to a 30 messages test.

Yeah, you can even feed 200.000 spams from the bebian lists to it IF
YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
is slidely different and he seems to exceed the ARGS limit.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
   50, rue de Soultz MSN LinuxMichi
0033/6/6192519367100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: sa-learn --ham ground rules

2008-02-12 Thread Michelle Konzack
Am 2008-02-08 01:49:52, schrieb Gene Heskett:
> So what is the maximum number of files in a directory that one can feed to 
> sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
> feeding it my corpus of another folder it was having trouble with a year ago, 
> the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
> that time.

Many programs including "rm", "mv", "ls" and "sa-learn" have a limit in
the commandline options which is arround 1200 to 1400.

So I would never try to feed more then 1000 messages/files at once to it.

Note:  "sa-learn" exits automaticaly if you feed to many messages to it.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
   50, rue de Soultz MSN LinuxMichi
0033/6/6192519367100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: sa-learn --ham ground rules

2008-02-09 Thread Chris Hoogendyk



Gene Heskett wrote:

On Saturday 09 February 2008, jdow wrote:
  

From: "John Hardin" <[EMAIL PROTECTED]>
Sent: Friday, 2008, February 08 21:03



Gene Heskett sez:
  

running as root since RH5.1.  Yeah, I'm an un-repentant old fart.


There's no fool like an old fool.
  

I'm close enough to Gene's age and have known him long enough I get
the right to rap his knuckles. Hm, in about a year that advances a
step to rap his knuckles with an iron bar?

{^_-}



Ouch, that would hurt my arthritic joints something terrible. Can it wait till 
I've had a chance to hit my thumbs with another cortisone shot?  On second 
thought, the iron bar is less painful in the short term.  The last time I 
checked, they wanted to do surgery at $5k per thumb and I said how about 
cortisone?  He said then (15 years ago) that it was $60 a shot, and it would 
hurt like hell.  He was right on both counts, but that thumb still works 
today.  Now its the other ones turn I guess.  :)


hmm. hurt like hell? I think that's very Dr. specific. I got a shot that 
was eased in slowly, front loaded with lidocane, back loaded with 
cortisone. It was almost painless, the pain I was experiencing before 
the shot disappeared almost immediately due to the lidocane, and then 
disappeared in a more ongoing basis due to the cortisone. Magic.




---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 


<[EMAIL PROTECTED]>

--- 


Erdös 4




Re: sa-learn --ham ground rules

2008-02-09 Thread Gene Heskett
On Saturday 09 February 2008, jdow wrote:
>From: "John Hardin" <[EMAIL PROTECTED]>
>Sent: Friday, 2008, February 08 21:03
>
>> Gene Heskett sez:
>>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
>>
>> There's no fool like an old fool.
>
>I'm close enough to Gene's age and have known him long enough I get
>the right to rap his knuckles. Hm, in about a year that advances a
>step to rap his knuckles with an iron bar?
>
>{^_-}

Ouch, that would hurt my arthritic joints something terrible. Can it wait till 
I've had a chance to hit my thumbs with another cortisone shot?  On second 
thought, the iron bar is less painful in the short term.  The last time I 
checked, they wanted to do surgery at $5k per thumb and I said how about 
cortisone?  He said then (15 years ago) that it was $60 a shot, and it would 
hurt like hell.  He was right on both counts, but that thumb still works 
today.  Now its the other ones turn I guess.  :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
FORCE YOURSELF TO RELAX!


Re: sa-learn --ham ground rules

2008-02-08 Thread jdow

From: "John Hardin" <[EMAIL PROTECTED]>
Sent: Friday, 2008, February 08 21:03


Gene Heskett sez:


running as root since RH5.1.  Yeah, I'm an un-repentant old fart.


There's no fool like an old fool.


I'm close enough to Gene's age and have known him long enough I get
the right to rap his knuckles. Hm, in about a year that advances a
step to rap his knuckles with an iron bar?

{^_-}


Re: sa-learn --ham ground rules

2008-02-08 Thread Gene Heskett
On Saturday 09 February 2008, John Hardin wrote:
>Gene Heskett sez:
>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
>
>There's no fool like an old fool.

And that's why they pay me the big bucks when something really goes aglay at 
the tv station even if I have been semi-retired since mid 2002.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Lost: gray and white female cat.  Answers to electric can opener.


Re: sa-learn --ham ground rules

2008-02-08 Thread John Hardin

Gene Heskett sez:
>
> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.

There's no fool like an old fool.

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 4 days until Abraham Lincoln's and Charles Darwin's 199th Birthdays



Re: sa-learn --ham ground rules

2008-02-08 Thread Gene Heskett
On Friday 08 February 2008, jdow wrote:
>From: "Gene Heskett" <[EMAIL PROTECTED]>
>Sent: Friday, 2008, February 08 16:43
>
>> On Friday 08 February 2008, Karsten Bräckelmann wrote:
>>>On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
>>>> The command that kmail issues to it is:
>>>> sa-learn --ham  /root/Mail/(foldername)/cur
>>>
>>>You're not using root as your ordinary user account, do you !?
>>>
>>>  guenther
>>
>> In fact I do, but I have myself somewhat in a sandbox as all the mail
>> handling
>> stuff except kmail runs as an unprivileged user, and kmail pulls incoming
>> from that mailbox in /var.  I've been doing that for about 2-3 of years,
>> started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an
>> un-repentant old fart.
>
>Gene, how many times have I told you "don't do that with Linux"? Old fart
>or not, you broke it. Now you get to fix it.

Ya mean I get to keep all the pieces?  Oh goodie.

>You declared you run all mail handling as an unprivileged user. Then you try
>to run sa-learn as root on root's mailbox. There is rather a problem there
>if
>you think about it. Sit and ruminate a few minutes. Seriously - think.
>
>Have you considered what you are doing and where at least one
>hyper-obvious problem lies?
>
>If you're not banging your forehead and screaming "Doh!" by now here is the
>clue bat. If you've figured it out, "DUCK!"

Quack?

>SpamAssassin will be using a Bayes database collected as that unprivileged
>user. It cannot use one generated as root and placed in root's directory
>structure. The last I knew you were trying to use per user Bayes.

That I wouldn't bet on, but spamassassins kids are running as gene, called 
into service by procmail also running as gene so I'd have to assume the 
applicable bayes database its using is the one in /home/gene.
 
>So 
>sa-learn as root will build the file in a place the unprivileged process
>cannot access AND will likely leave the file with privileges that prevent
>access by that unprivileged user.
>
>That's issue one for you to fix.
>
>And if you don't fix your errant ways mama's gonna whup you good.

Good, when you get done I'll buy.

>What size machine are you trying to work on? How deep into your swap
>file are you when you run sa-learn?

xp-2800, a gig of ram, 2 of swap.  Swap is very rarely touched.

>{^_^}   Joanne, ashamed I've known you all these years, Gene. You shame
>me by not taking advice repeated virtually every time we
>communicate.

It figures, Joanne would have to see how long she can balance on the soap 
box. ;-)  At least here in Weston, our 'free speech stump', the stump of a 4+ 
foot diameter Grand Old Man that was here & probably 50 feet tall when the 
war between the states was a current event, has a guard rail now, and since 
they left it about 4 feet tall, has a set of steps so even an old fart like 
me can make it up onto it should I feel the urge to make a speech.

However, I see what you are saying, both about perms, and locations.  
Excellent points, I'll see what I can figure out toward making that database 
belong to me instead of root.  Obviously I didn't carry that conversion to 
user near far enough, so I deserve the knuckle rap.

How about I change that kmail filter rule to use:
runcon -l gene sa-learn --spam /path/to/spam
and:
runcon -l gene sa-learn --ham /path/to/ham

Now, I note that the /home/gene/.spamassassin/bayes* stuff is carrying a very 
current time stamp,

# ls -l
total 53332
-rw--- 1 gene gene 20983808 2008-02-08 23:27 auto-whitelist
-rw-rw-rw- 1 gene gene6 2008-01-03 02:37 auto-whitelist.mutex
-rw--- 1 gene gene26616 2008-02-08 23:27 bayes_journal
-rw-rw-rw- 1 gene gene   147750 2008-01-03 02:37 bayes.mutex
-rw--- 1 gene gene 41889792 2008-02-08 23:27 bayes_seen
-rw--- 1 gene gene  5292032 2008-02-08 23:27 bayes_toks
-rw-r--r-- 1 gene gene  934 2005-12-14 16:58 init.pre
-rw-r--r-- 1 gene gene 1164 2006-01-16 13:45 user_prefs
-rw-r--r-- 1 gene gene 2397 2005-12-14 16:58 v310.pre

so apparently it is doing some self-learning?

Many thanks girl.  I will get it sorted.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
There is hardly a thing in the world that some man can not make a little
worse and sell a little cheaper.


Re: sa-learn --ham ground rules

2008-02-08 Thread jdow

From: "Gene Heskett" <[EMAIL PROTECTED]>
Sent: Friday, 2008, February 08 16:43



On Friday 08 February 2008, Karsten Bräckelmann wrote:

On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:



The command that kmail issues to it is:
sa-learn --ham  /root/Mail/(foldername)/cur


You're not using root as your ordinary user account, do you !?

 guenther


In fact I do, but I have myself somewhat in a sandbox as all the mail 
handling

stuff except kmail runs as an unprivileged user, and kmail pulls incoming
from that mailbox in /var.  I've been doing that for about 2-3 of years,
started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an
un-repentant old fart.


Gene, how many times have I told you "don't do that with Linux"? Old fart
or not, you broke it. Now you get to fix it.

You declared you run all mail handling as an unprivileged user. Then you try
to run sa-learn as root on root's mailbox. There is rather a problem there 
if

you think about it. Sit and ruminate a few minutes. Seriously - think.

Have you considered what you are doing and where at least one
hyper-obvious problem lies?

If you're not banging your forehead and screaming "Doh!" by now here is the
clue bat. If you've figured it out, "DUCK!"

SpamAssassin will be using a Bayes database collected as that unprivileged
user. It cannot use one generated as root and placed in root's directory
structure. The last I knew you were trying to use per user Bayes. So
sa-learn as root will build the file in a place the unprivileged process
cannot access AND will likely leave the file with privileges that prevent
access by that unprivileged user.

That's issue one for you to fix.

And if you don't fix your errant ways mama's gonna whup you good.

What size machine are you trying to work on? How deep into your swap
file are you when you run sa-learn?

{^_^}   Joanne, ashamed I've known you all these years, Gene. You shame
   me by not taking advice repeated virtually every time we
   communicate.



Re: sa-learn --ham ground rules

2008-02-08 Thread Gene Heskett
On Friday 08 February 2008, Karsten Bräckelmann wrote:
>On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
>> The sa-learn --spam can process a message in 5 to 10 seconds or so, so if
>> I've dropped 20 doofus mails in the spam directory and fire it off, I have
>> it done and kmail is back among the living in 2-3 minutes.
>
>This seems *way* too high. If there have been only 20 messages total in
>that folder, sa-learn should have processed these in a few *seconds* or
>less.
>
>> But, feeding it a 'ham' directory with about 7k messages in it, turned
>> sa-learn into a 100% cpu hog, [...]
>
>What did you expect? Based on your numbers above, processing that folder
>would have taken 10-20 *hours*...
>
>> incrementing the message processed number only
>> about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I
>> must have fed it a kill -9 50 times.
>
>Hmm. Kmail doesn't start one process per mail by any chance?
>
>> So what is the maximum number of files in a directory that one can feed to
>> sa-learn --ham and expect it to achieve normal speed?
>
>Dunno if there are limitations -- however, your 7k messages should be
>perfectly fine. Just ran a test on a 6k messages mbox file, and there
>was no noticeable difference to a 30 messages test.
>
>> The command that kmail issues to it is:
>> sa-learn --ham  /root/Mail/(foldername)/cur
>
>You're not using root as your ordinary user account, do you !?
>
>  guenther

In fact I do, but I have myself somewhat in a sandbox as all the mail handling 
stuff except kmail runs as an unprivileged user, and kmail pulls incoming 
from that mailbox in /var.  I've been doing that for about 2-3 of years, 
started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an 
un-repentant old fart.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
CPU needs recalibration


Re: sa-learn --ham ground rules

2008-02-08 Thread Karsten Bräckelmann
On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
> The sa-learn --spam can process a message in 5 to 10 seconds or so, so if 
> I've 
> dropped 20 doofus mails in the spam directory and fire it off, I have it done 
> and kmail is back among the living in 2-3 minutes.

This seems *way* too high. If there have been only 20 messages total in
that folder, sa-learn should have processed these in a few *seconds* or
less.


> But, feeding it a 'ham' directory with about 7k messages in it, turned 
> sa-learn into a 100% cpu hog, [...]

What did you expect? Based on your numbers above, processing that folder
would have taken 10-20 *hours*...


> incrementing the message processed number only 
> about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
> must have fed it a kill -9 50 times.

Hmm. Kmail doesn't start one process per mail by any chance?


> So what is the maximum number of files in a directory that one can feed to 
> sa-learn --ham and expect it to achieve normal speed?

Dunno if there are limitations -- however, your 7k messages should be
perfectly fine. Just ran a test on a 6k messages mbox file, and there
was no noticeable difference to a 30 messages test.


> The command that kmail issues to it is:
> sa-learn --ham  /root/Mail/(foldername)/cur

You're not using root as your ordinary user account, do you !?

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



sa-learn --ham ground rules

2008-02-07 Thread Gene Heskett
Greetings;

About an hour ago, based on some comments made that the bayes database needed 
trained on ham as well as spam, and because it seemed to be forgetting some 
of the stuff I'd fed it as spam, I re-wrote that filter rule in kmail to 
launch it using one of my sorted directories from a mailing this as the 
argument.  Syntax otherwise the same as the sa-learn-spam filter.

The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've 
dropped 20 doofus mails in the spam directory and fire it off, I have it done 
and kmail is back among the living in 2-3 minutes.

But, feeding it a 'ham' directory with about 7k messages in it, turned 
sa-learn into a 100% cpu hog, incrementing the message processed number only 
about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
must have fed it a kill -9 50 times.  Finally, one of the kills killed x too!  
But no console came back, so I had to hit the reset button.  The reboot was 
like molassses in January, so I did a power down, same story.  Same story 3 
times running, so I went and made a sandwich while it set powered down.  Then 
the reboot was normal up to e2fscking a a 372GB drive I use for amanda, the 
backup proggy.  That hung, with no indication of progress for about 20 
minutes, no marching  or anything.  But it finally fell through and 
completed the bootup, and is running normally now but it has taken the 
majority of an hour to do this.

So what is the maximum number of files in a directory that one can feed to 
sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
feeding it my corpus of another folder it was having trouble with a year ago, 
the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
that time.

The command that kmail issues to it is:
sa-learn --ham  /root/Mail/(foldername)/cur

Where foldername is whatever mailing list I want to tell it is ham.

Is this correct?  I've had it setup that way for 2 or 3 years at least and 
till now it hasn't been that much of a problem.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"What a wonder is USENET; such wholesale production of conjecture from
such a trifling investment in fact."
-- Carl S. Gutekunst


RE: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-09 Thread Michael Scheidell
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Friday, September 08, 2006 8:20 PM
> To: Michael Scheidell
> Cc: users@spamassassin.apache.org
> Subject: Re: BUG? sa-learn --ham vs spamassassin -r different results
> 
> Second, You mis-understood my question. I know you erased 
> your bayes DB.
> 
>  I was asking if the message had already been scanned, and 
> thus marked, by SA. I'm asking a question about the 
> text-content of the message, not about the contents of your bayes DB.

No, it was not scanned by sa, there was no text markup in it, and tried
with and without -d option to spamassassin.

Further, spamassassin sent LESS tokens to db than sa-learn (which has no
-d option )

Clearly, both are using different routines to score and classify tokens.


Re: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-08 Thread Matt Kettler
Michael Scheidell wrote:
> Matt Kettler wrote:
>>
>>>
>>> Further, spamassassin -r and sa-learn --spam learn differently, give
>>> different results:
>>>   
>>> 
>> By any chance was the message used scanned by SA already?
>>
>> I'm wondering if it's a bug where spamassassin -r is stripping markups,
>> but sa-learn is not.
>>
>>   
> no, see my test methodology
> I erased the file and created a blank one.
First, sorry for the delay, my internet service got knocked out.

Second, You mis-understood my question. I know you erased your bayes DB.

 I was asking if the message had already been scanned, and thus marked,
by SA. I'm asking a question about the text-content of the message, not
about the contents of your bayes DB.


RE: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Gary V

> I'm wondering if it's a bug where spamassassin -r is
> stripping markups, but sa-learn is not.

Spamassassin has a -d (remove-markup) option, but sa-learn does not.

I am not using the -d option, furthermore, sa-learn 'learns' more
tokens.

A lot more.

I would suspect that sa with -d would learn even less.

Just checked:  email I sent through does not have any spam markup in it,
no X-Spam, or virus checked headers, etc.




My test includes amavisd-new x-spam headers. After learning this message via 
both methods, it is apparent there some tokens only learned once, and others 
learned twice, and both messages are 'seen' as different messages.


[...]
t   2   0   1157567088  03f9b2c6c2
t   2   0   1157567088  82f659b312
t   1   0   1157567088  2e2723c9b5
t   2   0   1157567088  fd590a5c26
t   1   0   1157567088  81884c3cef
t   1   0   1157563460  9b1dba02fa
t   1   0   1157563460  7407f9d6b8
[...]
t   1   0   1157563460  b33e439afe
s   s   [EMAIL PROTECTED]
s   s   [EMAIL PROTECTED]

So, I would conclude that some of the tokens are learned again when the same 
message is learned again by the other method. I would have to guess that the 
message identifier is created at different stages for the two methods. In 
other words, they are not looking at the exact same message when the 
identifier is created. But this is a guess.


Gary V

_
Call friends with PC-to-PC calling -- FREE   
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline




RE: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Michael Scheidell
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 06, 2006 8:49 AM
> To: Michael Scheidell
> Cc: users@spamassassin.apache.org
> Subject: Re: BUG? sa-learn --ham vs spamassassin -r different results
> 
> I'm wondering if it's a bug where spamassassin -r is 
> stripping markups, but sa-learn is not.

Spamassassin has a -d (remove-markup) option, but sa-learn does not.

I am not using the -d option, furthermore, sa-learn 'learns' more
tokens.

A lot more.

I would suspect that sa with -d would learn even less.

Just checked:  email I sent through does not have any spam markup in it,
no X-Spam, or virus checked headers, etc.




Re: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Michael Scheidell




Matt Kettler wrote:

  Michael Scheidell wrote:
  
  
# sa-learn -L --spam and spamassassin -L -r learn the same spam
differently.
SA version 3.13, using db or sql database, doesn't seem to matter,
--sync or not --sync, doesn't matter.

Also, it doesn't matter if I run sa-learn --spam or spamassassin -r
first.

Further, spamassassin -r and sa-learn --spam learn differently, give
different results:
  

  
  By any chance was the message used scanned by SA already?

I'm wondering if it's a bug where spamassassin -r is stripping markups,
but sa-learn is not.

  

no, see my test methodology
I erased the file and created a blank one.


-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131





Re: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Justin Mason

Matt Kettler writes:
> Michael Scheidell wrote:
> > # sa-learn -L --spam and spamassassin -L -r learn the same spam
> > differently.
> > SA version 3.13, using db or sql database, doesn't seem to matter,
> > --sync or not --sync, doesn't matter.
> >
> > Also, it doesn't matter if I run sa-learn --spam or spamassassin -r
> > first.
> >
> > Further, spamassassin -r and sa-learn --spam learn differently, give
> > different results:
> >   
> By any chance was the message used scanned by SA already?
> 
> I'm wondering if it's a bug where spamassassin -r is stripping markups,
> but sa-learn is not.

comparing output of --debug might help, too...

--j.


Re: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Matt Kettler
Michael Scheidell wrote:
> # sa-learn -L --spam and spamassassin -L -r learn the same spam
> differently.
> SA version 3.13, using db or sql database, doesn't seem to matter,
> --sync or not --sync, doesn't matter.
>
> Also, it doesn't matter if I run sa-learn --spam or spamassassin -r
> first.
>
> Further, spamassassin -r and sa-learn --spam learn differently, give
> different results:
>   
By any chance was the message used scanned by SA already?

I'm wondering if it's a bug where spamassassin -r is stripping markups,
but sa-learn is not.



BUG? sa-learn --ham vs spamassassin -r different results

2006-09-06 Thread Michael Scheidell
# sa-learn -L --spam and spamassassin -L -r learn the same spam
differently.
SA version 3.13, using db or sql database, doesn't seem to matter,
--sync or not --sync, doesn't matter.

Also, it doesn't matter if I run sa-learn --spam or spamassassin -r
first.

Further, spamassassin -r and sa-learn --spam learn differently, give
different results:


running spamassassin -Lr against a clean db with my test email gives me
130 tokens.
running sa-learn -L --spam against a clean db, same test email gives me
146 tokens.

Test:

# clean out old sa db:
rm -rf /var/db/spamassassin

#create new one:
mkdir /var/db/spamassassin
chown vscan:vscan /var/db/spamassassin

#test it:
su - vscan -c "sa-learn --sync && sa-learn --dump magic"

0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

# run new email though it (lets not mess with dcc, razor, spamcop for
this test)
su - vscan -c "spamassassin -rL < /tmp/spam.eml"
su - vscan -c "sa-learn --sync && sa-learn --dump magic"
1 message(s) examined.

0.000  0  3  0  non-token data: bayes db version
0.000  0  1  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0130  0  non-token data: ntokens
0.000  0 1157160113  0  non-token data: oldest atime
0.000  0 1157160113  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

with sync: (no difference)
su vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000  0  3  0  non-token data: bayes db version
0.000  0  1  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0130  0  non-token data: ntokens
0.000  0 1157160113  0  non-token data: oldest atime
0.000  0 1157160113  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

Now try sa-learn:
su - vscan -c "sa-learn -L --spam < /tmp/spam.eml"
su - vscan -c "sa-learn --sync &&  && sa-learn --dump magic"
Learned tokens from 1 message(s) (1 message(s) examined)

Yep, it does something different enough.

0.000  0  3  0  non-token data: bayes db version
0.000  0  2  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0227  0  non-token data: ntokens
0.000  0 1157160113  0  non-token data: oldest atime
0.000  0 1157464500  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

#control: let's run sa-learn first:
rm -rf /var/db/spamassassin
sme-500# mkdir -p spamassassin
sme-500# chown vscan:vscan spamassassin
sme-500# su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-to

RE: Sa-learn --ham vs spamassassin -report

2006-09-05 Thread Gary V

> Starting out with another clean database, further testing
> shows that in fact the message was learned when I ran
> 'spamassassin -r' (though bayes_toks remained at 12288
> bytes), and the same message was learned again when I ran
> 'sa-learn --spam' (and the size once again grew to 24576 bytes).
>
> 0.000  0  1  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0177  0  non-token data: ntokens
>
> 0.000  0  2  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0500  0  non-token data: ntokens

Q: did you run sa-learn --sync after each one? MAYBE spamassassin
doesn't --sync, but write it to a journal.
>


No, I did not sync after, no bayes_journal exists. I ran 'sa-learn --sync' 
to create the initial empty database. As you said, I think there is data 
that is stripped when 'spamassassin -r' is used. According to 'man 
spamassassin-run':


"If the message contains SpamAssassin markup, the markup will be stripped 
out automatically before submission."


Looking at a backup of the data, some tokens are learned once, and others 
twice, so I would have to guess that because of the changes made to the data 
prior to use of 'spamassassin -r', it is not recognized as the same message 
submitted by sa-learn. Could be wrong.


v   3   db_version # this must be the first line!!!
v   2   num_spam
v   0   num_nonspam
t   1   0   1115454743  300b46a9c8
t   1   0   1115454743  cec4f30a33
t   1   0   1115454743  998a30c1a0
t   1   0   1115454743  95c8cad5cc
t   2   0   1157469029  9fcdcc2e92
t   1   0   1115454743  4df5aec010
t   1   0   1115454743  4cf2f14d88
t   1   0   1115454743  e69ac5d0b5
t   1   0   1115454743  850959cf3e
t   1   0   1115454743  6fae298e7a
t   2   0   1157469029  23b47bd711
t   1   0   1115454743  fa29cdcb5d
t   2   0   1157469029  35dacb8efe
t   2   0   1157469029  24334e41ac
[...]

Gary V

_
Get real-time traffic reports with Windows Live Local Search  
http://local.live.com/default.aspx?v=2&cp=42.336065~-109.392273&style=r&lvl=4&scene=3712634&trfc=1




RE: Sa-learn --ham vs spamassassin -report

2006-09-05 Thread Michael Scheidell
> -Original Message-
> From: Gary V [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 04, 2006 7:56 PM
> To: users@spamassassin.apache.org
> Subject: Re: Sa-learn --ham vs spamassassin -report
> 
> Starting out with another clean database, further testing 
> shows that in fact the message was learned when I ran 
> 'spamassassin -r' (though bayes_toks remained at 12288 
> bytes), and the same message was learned again when I ran 
> 'sa-learn --spam' (and the size once again grew to 24576 bytes).
> 
> 0.000  0  1  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0177  0  non-token data: ntokens
> 
> 0.000  0  2  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0500  0  non-token data: ntokens

Q: did you run sa-learn --sync after each one? MAYBE spamassassin
doesn't --sync, but write it to a journal.


> 
> Gary V
> 
> _
> Check the weather nationwide with MSN Search: Try it now!  
> http://search.msn.com/results.aspx?q=weather&FORM=WLMTAG
> 
> 
> 


RE: Sa-learn --ham vs spamassassin -report

2006-09-05 Thread Michael Scheidell

> -Original Message-
> From: Gary V [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 04, 2006 7:56 PM
> To: users@spamassassin.apache.org
> Subject: Re: Sa-learn --ham vs spamassassin -report
> 
> Starting out with another clean database, further testing 
> shows that in fact 
> the message was learned when I ran 'spamassassin -r' (though 
> bayes_toks 
> remained at 12288 bytes), and the same message was learned 
> again when I ran 
> 'sa-learn --spam' (and the size once again grew to 24576 bytes).
> 
> 0.000  0  1  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0177  0  non-token data: ntokens
> 
> 0.000  0  2  0  non-token data: nspam
> 0.000  0  0  0  non-token data: nham
> 0.000  0500  0  non-token data: ntokens
> 
> Gary V

Ok, I'll bet that spamassassin -r and sa-learn --spam strip different
headers from the emails prior to deciding if they have been learned.

why did spamassassin -r only learn 177 tokens and sa-learn --spam learn
323 tokens?

-- 
Michael Scheidell, CTO
561-999-5000, ext 1131
SECNAP Network Security Corporation
Keep up to date with latest information on IT security: Real time
security alerts: http://www.secnap.com/news
 


Re: Sa-learn --ham vs spamassassin -report

2006-09-04 Thread Gary V
I understand. In the test I had an issue with, the database started out (at 
least appears to be) empty, I ran 'spamassassin -r' and the size did not 
change, then I ran sa-learn, and it did. Certainly not as good a test as 
actually checking the contents of the database:


-rw---  1 amavis amavis 12288 2006-08-26 18:18 bayes_seen
-rw---  1 amavis amavis 12288 2006-08-27 12:18 bayes_toks

sfa:~# su amavis -c 'spamassassin -r < email.txt'
[2762] warn: reporter: SpamCop message older than 2 days, not reporting
1 message(s) examined.

-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_seen
-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_toks

sfa:~# su amavis -c 'sa-learn --spam < email.txt'
Learned tokens from 1 message(s) (1 message(s) examined)

-rw---  1 amavis amavis 12288 2006-09-03 10:53 bayes_seen
-rw---  1 amavis amavis 24576 2006-09-03 10:53 bayes_toks

I erased the files and started over for the second test.



Starting out with another clean database, further testing shows that in fact 
the message was learned when I ran 'spamassassin -r' (though bayes_toks 
remained at 12288 bytes), and the same message was learned again when I ran 
'sa-learn --spam' (and the size once again grew to 24576 bytes).


0.000  0  1  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0177  0  non-token data: ntokens

0.000  0  2  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0500  0  non-token data: ntokens

Gary V

_
Check the weather nationwide with MSN Search: Try it now!  
http://search.msn.com/results.aspx?q=weather&FORM=WLMTAG




Re: Sa-learn --ham vs spamassassin -report

2006-09-04 Thread Gary V



Gary V wrote:

> It did work this time. Even with the spamcop error...
>
> sfa:~# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  7  0  non-token data: nspam
> [...]
>
> sfa:~# su amavis -c 'spamassassin -r < email.txt'
> [2812] warn: reporter: SpamCop message older than 2 days, not reporting
> 1 message(s) examined.
>
> sfa:~# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  8  0  non-token data: nspam
> [...]

Bear in mind that if you have auto-learning enabled, some of the spam
you're trying to learn and report has already been learned correctly as
spam, so SpamAssassin won't try to re-learn it unless its status has
changed (e.g. you're now telling it that the mail was ham, not spam).
If the auto-learning mechanism made the correct assessment at scan-time,
then you won't see any Bayes changes when you try to learn it manually
later.

In other words, you're not witnessing a bug, you're seeing the designed
behaviour at work.  Incidentally, this would be the same whether you
were using '-r' or '--spam'--the same Bayes training routine gets called
in both cases, and the success or failure of the reporting effort has no
bearing on this.

--
Robert LeBlanc <[EMAIL PROTECTED]>
Renaissoft, Inc.
Maia Mailguard 



I understand. In the test I had an issue with, the database started out (at 
least appears to be) empty, I ran 'spamassassin -r' and the size did not 
change, then I ran sa-learn, and it did. Certainly not as good a test as 
actually checking the contents of the database:


-rw---  1 amavis amavis 12288 2006-08-26 18:18 bayes_seen
-rw---  1 amavis amavis 12288 2006-08-27 12:18 bayes_toks

sfa:~# su amavis -c 'spamassassin -r < email.txt'
[2762] warn: reporter: SpamCop message older than 2 days, not reporting
1 message(s) examined.

-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_seen
-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_toks

sfa:~# su amavis -c 'sa-learn --spam < email.txt'
Learned tokens from 1 message(s) (1 message(s) examined)

-rw---  1 amavis amavis 12288 2006-09-03 10:53 bayes_seen
-rw---  1 amavis amavis 24576 2006-09-03 10:53 bayes_toks

I erased the files and started over for the second test.

Gary V

_
Call friends with PC-to-PC calling -- FREE   
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline




Re: Sa-learn --ham vs spamassassin -report

2006-09-04 Thread Robert LeBlanc
Gary V wrote:

> It did work this time. Even with the spamcop error...
> 
> sfa:~# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  7  0  non-token data: nspam
> [...]
> 
> sfa:~# su amavis -c 'spamassassin -r < email.txt'
> [2812] warn: reporter: SpamCop message older than 2 days, not reporting
> 1 message(s) examined.
> 
> sfa:~# su amavis -c 'sa-learn --dump magic'
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0  8  0  non-token data: nspam
> [...]

Bear in mind that if you have auto-learning enabled, some of the spam
you're trying to learn and report has already been learned correctly as
spam, so SpamAssassin won't try to re-learn it unless its status has
changed (e.g. you're now telling it that the mail was ham, not spam).
If the auto-learning mechanism made the correct assessment at scan-time,
then you won't see any Bayes changes when you try to learn it manually
later.

In other words, you're not witnessing a bug, you're seeing the designed
behaviour at work.  Incidentally, this would be the same whether you
were using '-r' or '--spam'--the same Bayes training routine gets called
in both cases, and the success or failure of the reporting effort has no
bearing on this.

-- 
Robert LeBlanc <[EMAIL PROTECTED]>
Renaissoft, Inc.
Maia Mailguard 



signature.asc
Description: OpenPGP digital signature


[Fwd: Re: Sa-learn --ham vs spamassassin -report]

2006-09-04 Thread Michael Scheidell





Sorry for toppost, but seems right in contect.
Is is possible that if spamassassin -r < spam.eml fails one of the
remote tests (like spamcop, 'too old') that it doesn't go on to any of
the other tests?

is report to spamcop done BEFORE the baysian learning?

 Original Message 

  

  Subject: 
  Re: [AMaViS-user] Sa-learn --ham vs spamassassin -report


  Date: 
  Mon, 04 Sep 2006 10:10:24 -0700


  
  
  
  


  
  
  
  


  
  
  
  


  References: 
  <[EMAIL PROTECTED]>

  



Michael Scheidell wrote:
>  > -Original Message-
>   
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED]] On Behalf Of Gary V
>> Sent: Monday, September 04, 2006 12:49 AM
>> To: amavis-user@lists.sourceforge.net
>> Subject: Re: [AMaViS-user] Sa-learn --ham vs spamassassin -report
>>
>> Gary wrote:
>> So, I was wrong. A look at 'man spamassassin-run' says that 
>> Bayes will learn the message. Now I'm not sure why that 
>> didn't happen for me.
>> My next assumption would be all the submissions have to be 
>> successful before it is learned, and I had a Spamcop error.
>>
>> 
>
> Guess I would want to run sa-learn --spam if spamassassin -r returns an
> error?
> sa-learn --ham is spamasassin -k returns error?
> Etc?

In my own learning scripts, I use spamassassin -r with sa-learn 
--[ham|spam],. for exactly the reason you stumbled onto here.

Bill Taroli



-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131





RE: Sa-learn --ham vs spamassassin -report

2006-09-04 Thread Gary V

> On Sun, Sep 03, 2006 at 10:27:55PM -0400, Michael Scheidell wrote:
> > Sa coach sends stream to spamd with 'TELL' protocol.
> > It then calls the equivalent of 'spamassassin -r' (for spam) or '-z
> > for ham' or -f for forget.
> >
> > Do I need to call sa-learn --ham and sa-learn --spam also?
>
> No.

Postfix stop.
sa-learn --sync  # not needed with SQL, I think.
sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0  35777  0  non-token data: nspam
0.000  0  31979  0  non-token data: nham
0.000  0 134021  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

spamassassin -k < ham.message
1 message(s) examined.

sa-learn --sync  # not needed with SQL, I think.
sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0  35777  0  non-token data: nspam
0.000  0  31980  0  non-token data: nham
0.000  0 134155  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

Ok, one more ham.

spamassassin -r < spam.message

1 message(s) examined.

0.000  0  3  0  non-token data: bayes db version
0.000  0  35778  0  non-token data: nspam
0.000  0  31980  0  non-token data: nham
0.000  0 134197  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

Ok, one more spam, new spam, old enough that spamcop rejects it?
Also, spamcop: you using generic spamassassin send address or a real
one?
Has anyone really decided if anon spamassassin reports to spamcop do
anything?

Gary: can you try again? If message already learned, it won'!



It did work this time. Even with the spamcop error...

sfa:~# su amavis -c 'sa-learn --dump magic'
0.000  0  3  0  non-token data: bayes db version
0.000  0  7  0  non-token data: nspam
[...]

sfa:~# su amavis -c 'spamassassin -r < email.txt'
[2812] warn: reporter: SpamCop message older than 2 days, not reporting
1 message(s) examined.

sfa:~# su amavis -c 'sa-learn --dump magic'
0.000  0  3  0  non-token data: bayes db version
0.000  0  8  0  non-token data: nspam
[...]

Gary V

_
Got something to buy, sell or swap? Try Windows Live Expo  
ttp://clk.atdmt.com/MSN/go/msnnkwex001001msn/direct/01/?href=http://expo.live.com/




RE: Sa-learn --ham vs spamassassin -report

2006-09-04 Thread Michael Scheidell
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 04, 2006 12:07 AM
> To: users@spamassassin.apache.org
> Subject: Re: Sa-learn --ham vs spamassassin -report
> 
> On Sun, Sep 03, 2006 at 10:27:55PM -0400, Michael Scheidell wrote:
> > Sa coach sends stream to spamd with 'TELL' protocol.
> > It then calls the equivalent of 'spamassassin -r' (for spam) or '-z 
> > for ham' or -f for forget.
> > 
> > Do I need to call sa-learn --ham and sa-learn --spam also?
> 
> No.

Postfix stop.
sa-learn --sync  # not needed with SQL, I think.
sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0  35777  0  non-token data: nspam
0.000  0  31979  0  non-token data: nham
0.000  0 134021  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

spamassassin -k < ham.message
1 message(s) examined.

sa-learn --sync  # not needed with SQL, I think.
sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0  35777  0  non-token data: nspam
0.000  0  31980  0  non-token data: nham
0.000  0 134155  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

Ok, one more ham.

spamassassin -r < spam.message

1 message(s) examined.

0.000  0  3  0  non-token data: bayes db version
0.000  0  35778  0  non-token data: nspam
0.000  0  31980  0  non-token data: nham
0.000  0 134197  0  non-token data: ntokens
0.000  0 1156489182  0  non-token data: oldest atime
0.000  0 1157385371  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0 1157353298  0  non-token data: last expiry
atime
0.000  0 691200  0  non-token data: last expire
atime delta
0.000  0  79280  0  non-token data: last expire
reduction count

Ok, one more spam, new spam, old enough that spamcop rejects it?
Also, spamcop: you using generic spamassassin send address or a real
one?
Has anyone really decided if anon spamassassin reports to spamcop do
anything?

Gary: can you try again? If message already learned, it won'!



Re: Sa-learn --ham vs spamassassin -report

2006-09-03 Thread Theo Van Dinter
On Sun, Sep 03, 2006 at 10:27:55PM -0400, Michael Scheidell wrote:
> Sa coach sends stream to spamd with 'TELL' protocol.
> It then calls the equivalent of 'spamassassin -r' (for spam) or '-z for
> ham' or -f for forget.
> 
> Do I need to call sa-learn --ham and sa-learn --spam also?

No.

> If I call sa-learn --ham or --spam INSTEAD OF, I lose the ability to
> report to DCC,razor,spamcop.,pyzor, etc.

Well, you don't lose the ability to report to those, you just won't be
reporting to those at that point.

> So, is spamassassin -r a superset of sa-learn --spam? Or do I need to
> run them both to get the local Bayesian table updated?

No.  From the man page:

[...]
-r, --report
[...]
The message will also be submitted to SpamAssassin’s learning
systems; currently this is the internal Bayesian statistical-filtering
system (the BAYES rules).  (Note that if you only want to perform
statistical learning, and do not want to report mail to third-par-
ties, you should use the "sa-learn" command directly instead.)
[...]

-- 
Randomly Generated Tagline:
 Zapp: She's built like a steak house but she handles like a bistro. 


pgpwvX73k22uL.pgp
Description: PGP signature


Sa-learn --ham vs spamassassin -report

2006-09-03 Thread Michael Scheidell
I am working an a program that accepts spamassassin 'TELL' (learning)
reports (see the new 'spamassassin coach' for outlook and thunderbird)

Sa coach sends stream to spamd with 'TELL' protocol.
It then calls the equivalent of 'spamassassin -r' (for spam) or '-z for
ham' or -f for forget.

Do I need to call sa-learn --ham and sa-learn --spam also?

If I call sa-learn --ham or --spam INSTEAD OF, I lose the ability to
report to DCC,razor,spamcop.,pyzor, etc.

So, is spamassassin -r a superset of sa-learn --spam? Or do I need to
run them both to get the local Bayesian table updated?

It looks like spamassassin -r touches the Bayesian files, but doesn't
update them:
(Thanks to Gary V for looking at this for me)

Also, my program does change user to amavis (reported via top, and
ps-aux, and verified by ownership of files it creates, but it still
tries to use /root/.spamassassin/user_prefs (which it can't create as
user amavis! And I needed to start program as root to use port 783, I
use spamassassin -xr and it doesn't try to create /root/.spamassassin)


sfa:~# ls -l /var/lib/amavis/.spamassassin/
total 40
-rwxr-x---  1 amavis amavis 12288 2006-08-19 20:51 auto-whitelist
-rw-rw-rw-  1 amavis amavis12 2006-08-27 12:18 bayes.mutex
-rw---  1 amavis amavis 12288 2006-08-26 18:18 bayes_seen
-rw---  1 amavis amavis 12288 2006-08-27 12:18 bayes_toks
-rwxr-x---  1 amavis amavis  1487 2006-08-19 20:51 user_prefs

sfa:~# su amavis -c 'spamassassin -r < email.txt'
[2762] warn: reporter: SpamCop message older than 2 days, not reporting
1 message(s) examined.

sfa:~# ls -l /var/lib/amavis/.spamassassin/
total 40
-rwxr-x---  1 amavis amavis 12288 2006-08-19 20:51 auto-whitelist
-rw-rw-rw-  1 amavis amavis12 2006-09-03 10:52 bayes.mutex
-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_seen
-rw---  1 amavis amavis 12288 2006-09-03 10:51 bayes_toks
-rwxr-x---  1 amavis amavis  1487 2006-08-19 20:51 user_prefs

sfa:~# su amavis -c 'sa-learn --spam < email.txt'
Learned tokens from 1 message(s) (1 message(s) examined)

sfa:~# ls -l /var/lib/amavis/.spamassassin/
total 52
-rwxr-x---  1 amavis amavis 12288 2006-08-19 20:51 auto-whitelist
-rw-rw-rw-  1 amavis amavis15 2006-09-03 10:53 bayes.mutex
-rw---  1 amavis amavis 12288 2006-09-03 10:53 bayes_seen
-rw---  1 amavis amavis 24576 2006-09-03 10:53 bayes_toks
-rwxr-x---  1 amavis amavis  1487 2006-08-19 20:51 user_prefs

Looks like spamassassin -r is needed to report spam, but sa-learn --spam
is needed to train the baysian filters?


-- 
Michael Scheidell, CTO
SECNAP Network Security
561-999-5000 x 1131
www.secnap.com


Re: sa-learn ham and auto_whitelist

2005-10-20 Thread Matt Kettler

At 10:07 AM 10/20/2005, FH wrote:

> Really, you shouldn't be looking at the scores. You should be looking at
> what rules the messages are hitting. Only this can tell you the "why" of
> the matter. Everything else is just looking at the results.
>
Makes sense, I'll dig into that a little deeper to see if I can figure out
specifically what's triggering it.  Assuming I do find something is it better
to try and modify the rule directly or to come up w/ some sort of "counter"
rule?


It depends on what you find.

Sometimes the hits will suggest your bayes training is way off, and you 
might need some re-training if BAYES_99 keeps hitting a lot of nonspam mail.


Other times you'll notice your trusted_networks needs to be set manually 
because dialup RBLs and rules are causing FPs due to SA guessing the dialup 
ISP's mailserver is a part of your network. (see "TrustPath" in the wiki)


Still others you'll find a bonafide bug in a rule. Most commonly these 
occur in the rules looking for forged mailclients. In those cases, hack the 
score of the rule down with a score statement in your local.cf and check 
bugzilla to see if it's been reported or fixed in a new release.







Re: sa-learn ham and auto_whitelist

2005-10-20 Thread FH
Thanks for the reply/info

-- Original Message --
Received: Wed, 19 Oct 2005 01:10:06 PM EDT
From: Matt Kettler <[EMAIL PROTECTED]>
To: FH <[EMAIL PROTECTED]>Cc: users@spamassassin.apache.org
Subject: Re: sa-learn ham and auto_whitelist
> > 
> > - I know I could add a "whitelist_from" to local.cf but I was hoping 
> > for a more ellegant solution ;)
> 
> That is an infinitely more elegant than using spamassassin
> --add-addr-to-whitelist. Better would be to use whitelist_from_rcvd
> 

Ok, I'll start using this in the interim until I can figure out the
auto_whitelisting.


> > - Does everything I did look right?  Are there other tricks/tips that 
> > I missed?
> 
> Really, you shouldn't be looking at the scores. You should be looking at
> what rules the messages are hitting. Only this can tell you the "why" of
> the matter. Everything else is just looking at the results.
> 
Makes sense, I'll dig into that a little deeper to see if I can figure out
specifically what's triggering it.  Assuming I do find something is it better
to try and modify the rule directly or to come up w/ some sort of "counter"
rule?



Also regarding the auto_whitelist function, how can I verify it is working? 
I've been looking through the logs and don't really see anything there.  The
messages themselves don't seem to indicate anything.  The auto-whitelist file
in the SA dir seems to be growing, but while auto-whitelist.mutex is tracking
the date/time of the non .mutex file it hasn't changed size since I started
using the whitelist (about a week now).  

Thanks




Re: sa-learn ham and auto_whitelist

2005-10-19 Thread Matt Kettler
FH wrote:
> I have a script that goes through and looks for ham mailboxes every 6
> hours[1], I also recently added the below to my local.cf file:
> 
> use_auto_whitelist 1
> auto_whitelist_path /etc/mail/spamassassin/auto-whitelist
> 
> and primed the auto-whitelist w/ 
> 
> spamassassin --add-addr-to-whitelist= [according to the ORA
> book]
>

The ORA book apparently was mistaken. IMHO, this should *only* ever be used to
correct accidental contamination of the AWL database.

It should never be used as a mechanism to try to whitelist a sender, as it's
affects decay as additional messages are received and you'll have to keep
re-running it to achieve the same results.

> and restarted spamd.  This was about a week ago but the user is still
> reporting the emails from this address are consistantly coming through marked
> as spam.
> 
> Some more relavent info:
> - I'm running SA 3.0.2 w/ Postfix and for the most part it seems to be working
> ok.  Since I've added the scripts that look for ham/spam automatically every 6
> hours the hit rate has significantly improved.
> 
> - The emails that are marked as spam are in a foreign language (Korean in case
> that's significant somehow), however there are other emails in that language
> that come through ok.
> 
> - I know I could add a "whitelist_from" to local.cf but I was hoping for a
> more ellegant solution ;)

That is an infinitely more elegant than using spamassassin
--add-addr-to-whitelist. Better would be to use whitelist_from_rcvd


> 
> - I don't allow user defined rules.
> 
> 
> Questions:
> - Does everything I did look right?  Are there other tricks/tips that I
> missed?

Really, you shouldn't be looking at the scores. You should be looking at what
rules the messages are hitting. Only this can tell you the "why" of the matter.
Everything else is just looking at the results.




sa-learn ham and auto_whitelist

2005-10-19 Thread FH
I have a script that goes through and looks for ham mailboxes every 6
hours[1], I also recently added the below to my local.cf file:

use_auto_whitelist 1
auto_whitelist_path /etc/mail/spamassassin/auto-whitelist

and primed the auto-whitelist w/ 

spamassassin --add-addr-to-whitelist= [according to the ORA
book]

and restarted spamd.  This was about a week ago but the user is still
reporting the emails from this address are consistantly coming through marked
as spam.

Some more relavent info:
- I'm running SA 3.0.2 w/ Postfix and for the most part it seems to be working
ok.  Since I've added the scripts that look for ham/spam automatically every 6
hours the hit rate has significantly improved.

- The emails that are marked as spam are in a foreign language (Korean in case
that's significant somehow), however there are other emails in that language
that come through ok.

- I know I could add a "whitelist_from" to local.cf but I was hoping for a
more ellegant solution ;)

- I don't allow user defined rules.


Questions:
- Does everything I did look right?  Are there other tricks/tips that I
missed?

- Is it just a matter of giving it enough time?  Looking at the spam scores of
the emails, they are still coming in all over the place.  There haven't been
that many since the auto_whitelist went into place but we've been working
through sa-learn for a couple of months now and nothing seems to be changing.

Thanks


[1] Cronjob that runs 
find /local/home -name ham -type f ! \( -size 0 \) -ls -exec hamproc {} \;

hamproc contains
/usr/local/bin/sa-learn --ham --showdots --mbox $1 




Re: sa-learn ham from my emails

2005-02-15 Thread Thomas Arend
Am Montag, 14. Februar 2005 23:13 schrieb Daniel Cañas:
> On Feb 14, 2005, at 3:34 PM, Thomas Arend wrote:
> > Am Montag, 14. Februar 2005 20:50 schrieb Daniel Cañas:
> >> I have over 2000 emails that I have as ham and would like to feed to
> >> sa-learn..
> >
[..]

> >> I have legit spam that I want to learn but I am afraid to do it if I
> >> don't have corresponding number of ham.
> >
> > To my opinion and expirience this is bullshit.
>
> Cool.. this is good to know as I can collect tons of spam.

I have a ratio of 1 : 40 and bayes works fine.


Thomas
-- 
icq:133073900
http://www.t-arend.de


pgpfF1XwwAiyS.pgp
Description: PGP signature


Re: sa-learn ham from my emails

2005-02-14 Thread Daniel Cañas
On Feb 14, 2005, at 2:54 PM, Jim Maul wrote:
Daniel Cañas wrote:
I have over 2000 emails that I have as ham and would like to feed to 
sa-learn..
The emails are all mine (that is they are addresed to me) is this a 
problem for sa-learn?
Will it learn the headers and mark my email address as a token for 
ham... causing bayes to not work correctly for my address?
I have legit spam that I want to learn but I am afraid to do it if I 
don't have corresponding number of ham.
I guess the question is:
Is feeding a bunch of emails addressed to a single person into 
sa-learn a good thing to do?
I dont believe this to be an issue.  Not to mention that feeding the 
same number of ham/spam is not necessary either.  Many people have 
bayes databases with largely different spam/ham numbers.

cool.. Thanks...
-Jim




Re: sa-learn ham from my emails

2005-02-14 Thread Daniel Cañas
On Feb 14, 2005, at 3:34 PM, Thomas Arend wrote:
Am Montag, 14. Februar 2005 20:50 schrieb Daniel Cañas:
I have over 2000 emails that I have as ham and would like to feed to
sa-learn..
You should train them as ham.
That is my plan

The emails are all mine (that is they are addresed to me) is this a
problem for sa-learn?
Where is the problem? If they are not for you, why did you get them?
Will it learn the headers and mark my email address as a token for
ham... causing bayes to not work correctly for my address?
The address will be one token. If you feed spam to sa-learn your 
address will
be also a token for spam. But bayes does not work only on one token.

I have legit spam that I want to learn but I am afraid to do it if I
don't have corresponding number of ham.
To my opinion and expirience this is bullshit.
Cool.. this is good to know as I can collect tons of spam.

I guess the question is:
Is feeding a bunch of emails addressed to a single person into 
sa-learn
a good thing to do?
Why not? I run spamassassin on a single user system. You can have an
individual database for every user or a common db for all users. In 
the last
case you should train spam not only for one user.

I just switched to sitewide bayes and the spam I train is addressed to 
different users.
Mostly non-existent users on my system whose mail is forwarded to the 
admin account.

Thomas
--
icq:133073900
http://www.t-arend.de



Re: sa-learn ham from my emails

2005-02-14 Thread Thomas Arend
Am Montag, 14. Februar 2005 20:50 schrieb Daniel Cañas:
> I have over 2000 emails that I have as ham and would like to feed to
> sa-learn..

You should train them as ham.

>
> The emails are all mine (that is they are addresed to me) is this a
> problem for sa-learn?

Where is the problem? If they are not for you, why did you get them?

>
> Will it learn the headers and mark my email address as a token for
> ham... causing bayes to not work correctly for my address?

The address will be one token. If you feed spam to sa-learn your address will 
be also a token for spam. But bayes does not work only on one token.

>
> I have legit spam that I want to learn but I am afraid to do it if I
> don't have corresponding number of ham.

To my opinion and expirience this is bullshit.

> I guess the question is:
> Is feeding a bunch of emails addressed to a single person into sa-learn
> a good thing to do?

Why not? I run spamassassin on a single user system. You can have an 
individual database for every user or a common db for all users. In the last 
case you should train spam not only for one user.


Thomas
-- 
icq:133073900
http://www.t-arend.de


pgpYuXagtqcdI.pgp
Description: PGP signature


Re: sa-learn ham from my emails

2005-02-14 Thread Jim Maul
Daniel Cañas wrote:
I have over 2000 emails that I have as ham and would like to feed to 
sa-learn..

The emails are all mine (that is they are addresed to me) is this a 
problem for sa-learn?

Will it learn the headers and mark my email address as a token for 
ham... causing bayes to not work correctly for my address?

I have legit spam that I want to learn but I am afraid to do it if I 
don't have corresponding number of ham.

I guess the question is:
Is feeding a bunch of emails addressed to a single person into sa-learn 
a good thing to do?
I dont believe this to be an issue.  Not to mention that feeding the 
same number of ham/spam is not necessary either.  Many people have bayes 
databases with largely different spam/ham numbers.

-Jim



sa-learn ham from my emails

2005-02-14 Thread Daniel Cañas
I have over 2000 emails that I have as ham and would like to feed to 
sa-learn..

The emails are all mine (that is they are addresed to me) is this a 
problem for sa-learn?

Will it learn the headers and mark my email address as a token for 
ham... causing bayes to not work correctly for my address?

I have legit spam that I want to learn but I am afraid to do it if I 
don't have corresponding number of ham.

I guess the question is:
Is feeding a bunch of emails addressed to a single person into sa-learn 
a good thing to do?



RE: sa-learn ham

2004-11-30 Thread Gustafson, Tim
Gray,

Sorry for the delay in response.

I just wanted to let you know that your script worked PERFECTLY and I
now have a sensible "newest atime", which allowed me to expire my
database properly.

Thanks a million!

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 


-Original Message-
From: Gray, Richard [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 25, 2004 11:45 AM
To: Gustafson, Tim; users@spamassassin.apache.org
Subject: RE: sa-learn ham


 
We had a similar problem with our system a while back (SA 2.64, Exim 4
using exiscan)

I found the attached script. It didn't work perfectly, so I edited it a
bit. 

However, this was 2-3 months ago, and I didn't comment my changes (it
was only for my company ;) )

We had a problem that because it wasn't expiring tokens, there were some
really old tokens in there too, which caused problems as well. There is
a line in the code that throws away old tokens. If your Dbase is new
then this won't be a problem for you.

Anyway, I don't know which bits I added/removed/changed, but if you have
a problem you can always drop me a line and I'll see what I can do.

Richard


smime.p7s
Description: S/MIME cryptographic signature


Re: sa-learn ham

2004-11-26 Thread Gavin Cato
I agree, autolearn in conjunction with the odd manual insert works very well
here, although I'm still having troubles blocking the variation of those
ridicoulous drugs/rx msgs.

0.000  01781758  0  non-token data: nspam
0.000  0 319835  0  non-token data: nham

Cheers

Gav

 
.
> 
> While i would *not* recommend running on autolearning exclusively, it is
> working incredibly well here with the occasional manual sa-learn here
> and there.  sa-learn --dump magic shows the following for my system:
> 
> 0.000  0   1105  0  non-token data: nspam
> 0.000  0  28077  0  non-token data: nham
> 
> 
> Thats like a 1:25 ratio of ham:spam and it is quite rare that i see any
> bayes scores that arent bayes_0 or bayes_99.  Of course, your mileage
> may and probably will vary.
> 
> -Jim




Re: sa-learn ham

2004-11-25 Thread Michael Parker
On Thu, Nov 25, 2004 at 11:26:02AM -0500, Gustafson, Tim wrote:
> I was referring to the fact that the newest "A" time is 20-something
> years in the future.  Apparently, this is what is keeping it from
> expiring tokens.
> 
> 

This is a known problem with bayes in version < 3.0.  Upgrade to 3.0.1
and do a --backup and --restore and it will remove the tokens with
bogus atimes.

Michael


pgpjjliD16PXj.pgp
Description: PGP signature


RE: sa-learn ham

2004-11-25 Thread Gray, Richard
 
We had a similar problem with our system a while back (SA 2.64, Exim 4
using exiscan)

I found the attached script. It didn't work perfectly, so I edited it a
bit. 

However, this was 2-3 months ago, and I didn't comment my changes (it
was only for my company ;) )

We had a problem that because it wasn't expiring tokens, there were some
really old tokens in there too, which caused problems as well. There is
a line in the code that throws away old tokens. If your Dbase is new
then this won't be a problem for you.

Anyway, I don't know which bits I added/removed/changed, but if you have
a problem you can always drop me a line and I'll see what I can do.

Richard

-Original Message-
From: Gustafson, Tim [mailto:[EMAIL PROTECTED] 
Sent: 25 November 2004 16:26
To: Bill Landry; users@spamassassin.apache.org
Subject: RE: sa-learn ham

I was referring to the fact that the newest "A" time is 20-something
years in the future.  Apparently, this is what is keeping it from
expiring tokens.


Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 



-Original Message-
From: Bill Landry [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 25, 2004 11:09 AM
To: users@spamassassin.apache.org
Subject: Re: sa-learn ham


- Original Message -
From: "Gustafson, Tim" <[EMAIL PROTECTED]>

> I've attached the output of "sa-learn --force-expire -D" to this
e-mail.
>
> Is there any way to "repair" this problem?  I'd like to keep the 
> database since it's so incredibly effective right now.

All looks fine here, what's the problem that you're referring to?  If
you are talking about the following lines:
=
Can't use estimation method for expiry, something fishy, calculating
optimal atime delta (first pass)
&
couldn't find a good delta atime, need more token difference, skipping
expire =

that's normal when there is not enough token difference since the last
expire to require another expire process to happen.

Bill




---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






db-to-text2.pl
Description: db-to-text2.pl


RE: sa-learn ham

2004-11-25 Thread Gustafson, Tim
I was referring to the fact that the newest "A" time is 20-something
years in the future.  Apparently, this is what is keeping it from
expiring tokens.


Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 



-Original Message-
From: Bill Landry [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 25, 2004 11:09 AM
To: users@spamassassin.apache.org
Subject: Re: sa-learn ham


- Original Message - 
From: "Gustafson, Tim" <[EMAIL PROTECTED]>

> I've attached the output of "sa-learn --force-expire -D" to this
e-mail.
>
> Is there any way to "repair" this problem?  I'd like to keep the
> database since it's so incredibly effective right now.

All looks fine here, what's the problem that you're referring to?  If
you
are talking about the following lines:
=
Can't use estimation method for expiry, something fishy, calculating
optimal
atime delta (first pass)
&
couldn't find a good delta atime, need more token difference, skipping
expire
=

that's normal when there is not enough token difference since the last
expire to require another expire process to happen.

Bill



smime.p7s
Description: S/MIME cryptographic signature


Re: sa-learn ham

2004-11-25 Thread Bill Landry
- Original Message - 
From: "Gustafson, Tim" <[EMAIL PROTECTED]>

> I've attached the output of "sa-learn --force-expire -D" to this e-mail.
>
> Is there any way to "repair" this problem?  I'd like to keep the
> database since it's so incredibly effective right now.

All looks fine here, what's the problem that you're referring to?  If you
are talking about the following lines:
=
Can't use estimation method for expiry, something fishy, calculating optimal
atime delta (first pass)
&
couldn't find a good delta atime, need more token difference, skipping
expire
=

that's normal when there is not enough token difference since the last
expire to require another expire process to happen.

Bill



RE: sa-learn ham

2004-11-25 Thread Gustafson, Tim
I've attached the output of "sa-learn --force-expire -D" to this e-mail.

Is there any way to "repair" this problem?  I'd like to keep the
database since it's so incredibly effective right now.

Thanks.

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 



-Original Message-
From: David B Funk [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 25, 2004 1:01 AM
To: Gustafson, Tim
Cc: SA Users List
Subject: RE: sa-learn ham

Note that 'newest atime' value, it's 21 years in the future. That
is "poisoning" your expire, so it's not doing anyting.

  perl -e 'print scalar localtime(1762110386),"\n";'
  Sun Nov  2 13:06:26 2025

The 'ntokens' should be a more-or-less fixed value, based upon the
setting
of your "bayes_expiry_max_db_size". The values of nspam & nham should
continually increase but ntokens should hit an upper bound and go no
higher.
[EMAIL PROTECTED] sa-learn --force-expire -D
debug: Score set 0 chosen.
debug: running in taint mode? no
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: using "/usr/local/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 84478 tie-ing to DB file R/O 
/usr/local/etc/mail/spamassassin/bayes_toks
debug: bayes: 84478 tie-ing to DB file R/O 
/usr/local/etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 2 chosen.
debug: Initialising learner
debug: Initialising learner
debug: Syncing Bayes journal and expiring old tokens...
debug: lock: 84478 created 
/usr/local/etc/mail/spamassassin/bayes.lock.maze.meitech.com.84478
debug: lock: 84478 trying to get lock on /usr/local/etc/mail/spamassassin/bayes 
with 0 retries
debug: lock: 84478 link to /usr/local/etc/mail/spamassassin/bayes.lock: link ok
debug: bayes: 84478 tie-ing to DB file R/W 
/usr/local/etc/mail/spamassassin/bayes_toks
debug: bayes: 84478 tie-ing to DB file R/W 
/usr/local/etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
..
synced Bayes databases from journal in 0 seconds: 3542 unique entries (3879 
total entries)
debug: bayes: expiry check keep size, 75% of max: 112500
debug: bayes: token count: 1765956, final goal reduction size: 1653456
debug: bayes: First pass?  Current: 1101397025, Last: 1101388519, atime: 0, 
count: 0, newdelta: 0, ratio: 0
debug: bayes: Can't use estimation method for expiry, something fishy, 
calculating optimal atime delta (first pass)
debug: bayes: atime token reduction
debug: bayes:   ===
debug: bayes: 43200 1765908
debug: bayes: 86400 1765908
debug: bayes: 1728001765908
debug: bayes: 3456001765908
debug: bayes: 6912001765908
debug: bayes: 1382400   1765908
debug: bayes: 2764800   1765908
debug: bayes: 5529600   1765908
debug: bayes: 11059200  1765908
debug: bayes: 22118400  1765908
debug: bayes: couldn't find a good delta atime, need more token difference, 
skipping expire.
debug: Syncing complete.
debug: bayes: 84478 untie-ing
debug: bayes: 84478 untie-ing db_toks
debug: bayes: 84478 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 84478 unlink /usr/local/etc/mail/spamassassin/bayes.lock

smime.p7s
Description: S/MIME cryptographic signature


RE: sa-learn ham

2004-11-25 Thread David B Funk
On Wed, 24 Nov 2004, Gustafson, Tim wrote:

> How do you keep your ntokens so low?
>
> Mine averages ((nspam + nham) * 10).  Yours is basically (nspam + nham).
> Do you run some job that expires tokens or something?  I'm running
> sa-learn --force-expire once a day (and it takes about 2-3 minutes to
> run) but the ntokens never seems to go down.  :\
>
> Tim

Tim, that's because your Bayes is FUBAR, you've got a "future"
message in there that's fouling up your expire. Run it with
a '--D' and look at the output, I'll bet that it doesn't actually
expire anything.

Revisting your '--dump magic' output again:

[EMAIL PROTECTED] sa-learn --dump magic
0.000  0  2  0  non-token data: bayes db version
0.000  0  88033  0  non-token data: nspam
0.000  0  15592  0  non-token data: nham
0.000  01729756  0  non-token data: ntokens
0.000  0 1010964573  0  non-token data: oldest atime
0.000  0 1762110386  0  non-token data: newest atime
0.000  0 1101309901  0  non-token data: last journalsync atime
0.000  0 1101301792  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

Note that 'newest atime' value, it's 21 years in the future. That
is "poisoning" your expire, so it's not doing anyting.

  perl -e 'print scalar localtime(1762110386),"\n";'
  Sun Nov  2 13:06:26 2025

The 'ntokens' should be a more-or-less fixed value, based upon the setting
of your "bayes_expiry_max_db_size". The values of nspam & nham should
continually increase but ntokens should hit an upper bound and go no
higher.

On a busy system that's been running for a while, nspam & nham can
easily out strip ntokens. Here's my stats:

server15$ sa-learn --dump magic
0.000  0  2  0  non-token data: bayes db version
0.000  01275494  0  non-token data: nspam
0.000  0 525068  0  non-token data: nham
0.000  0 227192  0  non-token data: ntokens
0.000  0 1101252542  0  non-token data: oldest atime
0.000  0 1101360564  0  non-token data: newest atime
0.000  0 1101360564  0  non-token data: last journal sync atime
0.000  0 1101338998  0  non-token data: last expiry atime
0.000  0  86400  0  non-token data: last expire atime delta
0.000  0  73470  0  non-token data: last expire reduction 
count



-- 
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: sa-learn ham

2004-11-24 Thread Matt Barton
Gustafson, Tim wrote:
How do you keep your ntokens so low?
Mine averages ((nspam + nham) * 10).  Yours is basically (nspam + nham).
Do you run some job that expires tokens or something?  I'm running
sa-learn --force-expire once a day (and it takes about 2-3 minutes to
run) but the ntokens never seems to go down.  :\
I don't run --force-expire at all.  I think it will automatically expire 
tokens when certain criteria are met -- none of which I can recall as I 
write this e-mail, though I know you can find it online.

I have a conrjob that runs a script every half-hour that checks for 
e-mails that need to be manually fed in to sa-learn.  It pulls them out 
of designated ham and spam IMAP folders, runs them through sa-learn, and 
then runs sa-learn again with --sync.  I think the --sync may be what 
does it, but I don't know for sure.

-Original Message-
From: Matt Barton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 11:27 AM
To: SA Users List
Subject: Re: sa-learn ham
Since we're all playing show-and-tell, here is a dump of the magic on my
company's mail server.
0.000  0  3  0  non-token data: bayes db version
0.000  0 101024  0  non-token data: nspam
0.000  0 164343  0  non-token data: nham
0.000  0 240026  0  non-token data: ntokens
--
Matt Barton
Webexcellence
PH: 317.423.3548 x22
TF: 800.808.6332 x22
FX: 317.423.8735
[EMAIL PROTECTED]
www.webexc.com


RE: sa-learn ham

2004-11-24 Thread Gustafson, Tim
How do you keep your ntokens so low?

Mine averages ((nspam + nham) * 10).  Yours is basically (nspam + nham).
Do you run some job that expires tokens or something?  I'm running
sa-learn --force-expire once a day (and it takes about 2-3 minutes to
run) but the ntokens never seems to go down.  :\

Tim

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 

-Original Message-
From: Matt Barton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 11:27 AM
To: SA Users List
Subject: Re: sa-learn ham

Since we're all playing show-and-tell, here is a dump of the magic on my

company's mail server.

0.000  0  3  0  non-token data: bayes db version
0.000  0 101024  0  non-token data: nspam
0.000  0 164343  0  non-token data: nham
0.000  0 240026  0  non-token data: ntokens


smime.p7s
Description: S/MIME cryptographic signature


Re: sa-learn ham

2004-11-24 Thread Matt Barton
Gustafson, Tim wrote:
0.000  0  2  0  non-token data: bayes db version
0.000  0  88033  0  non-token data: nspam
0.000  0  15592  0  non-token data: nham
0.000  01729756  0  non-token data: ntokens
0.000  0 1010964573  0  non-token data: oldest atime
0.000  0 1762110386  0  non-token data: newest atime
0.000  0 1101309901  0  non-token data: last journal
sync atime
0.000  0 1101301792  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count
I agree with Jim that having your SPAM/HAM numbers match doesn't really
matter, as long as you have sufficient amounts of each.  I think the
"threshold" where my users started to expierence the best filtering
accuracy was when I topped 1000 SPAMs and HAMs.  But, as Jim said
before, your mileage may vary.
Since we're all playing show-and-tell, here is a dump of the magic on my 
company's mail server.

0.000  0  3  0  non-token data: bayes db version
0.000  0 101024  0  non-token data: nspam
0.000  0 164343  0  non-token data: nham
0.000  0 240026  0  non-token data: ntokens
0.000  0 1101226944  0  non-token data: oldest atime
0.000  0 1101313137  0  non-token data: newest atime
0.000  0 1101313136  0  non-token data: last journal 
sync atime
0.000  0 1101270336  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire 
atime delta
0.000  0 196502  0  non-token data: last expire 
reduction count

My auto-learn thresholds are set as follows in the global local.cf.
bayes_auto_learn_threshold_nonspam 0.8
bayes_auto_learn_threshold_spam 10.0
It is very important that you keep your bayes_min_[ham|spam]_num 
settings to at least 1000.

--
Matt Barton
Webexcellence
PH: 317.423.3548 x22
TF: 800.808.6332 x22
FX: 317.423.8735
[EMAIL PROTECTED]
www.webexc.com


RE: sa-learn ham

2004-11-24 Thread Gary W. Smith
Ronan, 

I have a cronjob that does the learning for me 

#!/bin/sh
sa-learn --spam --mbox /home/fizzle/spam > /dev/null 2>&1
:> /home/fizzle/spam
sa-learn --ham --mbox /home/fizzle/ham > /dev/null 2>&1
:> /home/fizzle/ham
#Root likes to lock the file (owner of cronjob), now reset it.
chown filter.users /etc/mail/spamassassin/bayes -R

We have an IMAP account setup that we just move a bunch of spams/hams to
(into the appropriate folder) and every night at 1:30 AM it runs.  We
used to do this on ever server (6 in total) but now we have a central
MySQL db for bayes.  

This solution works well for us.  It's also an easy way to get ham into
the system.

Gary Smith



Re: sa-learn ham

2004-11-24 Thread hamann . w
>> 
>> hi all.
>> for those of you running large volume servers you no doubt have an 
>> abundance of spam to feed into sa-learn, and i suppose that goes for all 
>> sizes of volumes.
>> but one question. how do you manage to match the same number with hams / 
>> real messages. how do you go about bumping up the numbers to even the 
>> DB? Am i right in saying that basically anymail thats not spam is ham or 
>> is ham only supposed to be mail that are false negatives ie have been 
>> tagged but arent really spam.
>> here at the university there are 3 admins who if they want could read 
>> other peoples email... Data protection blah blah but its simply a side 
>> affect of administering the systems.
>> 

Hi,

SA automatically learns mails that make it below the threshold as ham, so the 
system is
trained on actual mails recieved at a server.
Now what you can do is to set up some mailbox for users to drop spam mail that 
was missed,
and perhaps another one for ham mail that was labelled as spam

Wolfgang Hamann



RE: sa-learn ham

2004-11-24 Thread Gustafson, Tim
> ahh yeah hit reply instead of reply-all.
>
> anyone out there see anything major or minorly wrong with the output
below??

For what it's worth, here's my output:

[EMAIL PROTECTED] sa-learn --dump magic
0.000  0  2  0  non-token data: bayes db version
0.000  0  88033  0  non-token data: nspam
0.000  0  15592  0  non-token data: nham
0.000  01729756  0  non-token data: ntokens
0.000  0 1010964573  0  non-token data: oldest atime
0.000  0 1762110386  0  non-token data: newest atime
0.000  0 1101309901  0  non-token data: last journal
sync atime
0.000  0 1101301792  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count

I agree with Jim that having your SPAM/HAM numbers match doesn't really
matter, as long as you have sufficient amounts of each.  I think the
"threshold" where my users started to expierence the best filtering
accuracy was when I topped 1000 SPAMs and HAMs.  But, as Jim said
before, your mileage may vary.

Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 


smime.p7s
Description: S/MIME cryptographic signature


RE: sa-learn ham

2004-11-24 Thread Gustafson, Tim
Autolearn fails a lot of the time because your sendmail and/or
SpamAssassin process doesn't have write access to the bayes_* files.
Make sure that you chown and chmod these files accordingly.  :)

I had a big problem with this originally, and file permissions fixed it
for me.


Tim Gustafson
MEI Technology Consulting, Inc
[EMAIL PROTECTED]
(516) 379-0001 Office
(516) 480-1870 Mobile/Emergencies
(516) 908-4185 Fax
http://www.meitech.com/ 



-Original Message-
From: Jim Maul [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 24, 2004 10:07 AM
To: Ronan; SA Users List
Subject: Re: sa-learn ham


Ronan wrote:
> so it doesnt make a difference if you have inordinately larger amounts

> of one than the other?? I would have thought it would've worked better

> with more ham...
> i read somewhere on the list thats its best to balance.
> 

you'll get conflicting answers to this question.  The only real answer 
that i can tell is "see what works best on your system"  If you get 
significantly more spam than ham, and autolearning is enabled, then you 
will have signficantly more spam tokens than ham.  Over here at the 
hospital where i work, we get significantly more ham than spam so my 
numbers are usually the opposite from everyone elses.  This goes to show

that even with opposite ratios, the bayes system still works properly. 
This should be argument enough that you dont need to have a balance in 
the number of spam/ham tokens.

> on a related note why is my autolearn not funtioning properly???
> 

good question.  Im not sure what the causes of autolearn=failed are. 
But i did happen to notice that you appear to have ALL_TRUSTED firing on

every email you receive.  There may be larger issues with the setup 
here.  Posting on the list (and not just to me) may provide you will 
more answers as many more people will see the message as well.

-Jim

> bash-2.03$ sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db
version
> 0.000  0   1077  0  non-token data: nspam
> 0.000  0427  0  non-token data: nham
> 0.000  0 120915  0  non-token data: ntokens
> 0.000  0 1082126382  0  non-token data: oldest atime
> 0.000  0 1101307652  0  non-token data: newest atime
> 0.000  0 1101307670  0  non-token data: last journal 
> sync atime
> 0.000  0 1100189181  0  non-token data: last expiry
atime
> 0.000  0  0  0  non-token data: last expire 
> atime delta
> 0.000  0  0  0  non-token data: last expire 
> reduction count
> bash-2.03$ tail -f /var/log/syslog|grep autolearn
> Nov 24 14:56:12 elisha spamd[5125]: result: . -1 - ALL_TRUSTED 
>
scantime=2.8,size=1755,mid=<[EMAIL PROTECTED]
.uk>,autolearn=failed 
> 
> Nov 24 14:56:15 elisha spamd[5125]: result: . -1 - 
> ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
> scantime=0.6,size=1155,mid=<[EMAIL PROTECTED]>,autolearn=failed
> Nov 24 14:56:27 elisha spamd[6919]: result: .  0 - 
> ALL_TRUSTED,FROM_ENDS_IN_NUMS,MISSING_SUBJECT,NO_REAL_NAME 
> scantime=2.8,size=1329,mid=<[EMAIL PROTECTED]>,autolearn=no
> Nov 24 14:56:29 elisha spamd[7794]: result: . -1 - 
> ALL_TRUSTED,FROM_ENDS_IN_NUMS 
> scantime=2.5,size=1705,mid=<[EMAIL PROTECTED]>,autolearn=failed
> Nov 24 14:56:31 elisha spamd[5467]: result: .  0 - 
>
ALL_TRUSTED,FROM_ENDS_IN_NUMS,J_CHICKENPOX_21,J_CHICKENPOX_24,NO_REAL_NA
ME 
> scantime=5.2,size=4798,mid=<[EMAIL PROTECTED]>,autolearn=failed
> Nov 24 14:56:32 elisha spamd[6919]: result: . -1 - 
> ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
> scantime=2.5,size=2668,mid=<[EMAIL PROTECTED]>,autolearn=failed
> Nov 24 14:56:34 elisha spamd[7794]: result: . -1 - ALL_TRUSTED 
>
scantime=0.6,size=32341,mid=<[EMAIL PROTECTED]
d.am.qub.ac.uk>,autolearn=failed 
> 
> Nov 24 14:56:35 elisha spamd[5467]: result: .  2 - 
>
FORGED_HOTMAIL_RCVD2,FORGED_RCVD_HELO,MISSING_MIMEOLE,NO_REAL_NAME,PRIOR
ITY_NO_NAME,RCVD_IN_SORBS_DUL 
>
scantime=0.6,size=78030,mid=<[EMAIL PROTECTED]>,autolearn=
no
> Nov 24 14:56:36 elisha spamd[8365]: result: . -1 - 
> ALL_TRUSTED,HTML_MESSAGE,HTML_TAG_EXIST_TBODY 
>
scantime=8.5,size=12218,mid=<[EMAIL PROTECTED]
c.uk>,autolearn=failed 
> 
> Nov 24 14:56:38 elisha spamd[6919]: result: . -1 - ALL_TRUSTED 
>
scantime=1.1,size=14404,mid=<[EMAIL PROTECTED]>,
autolearn=failed 
> 
> Nov 24 14:56:38 elisha spamd[5467]: result: . -1 - 
> ALL_TRUSTED,HTML_60_70,HTML_MESSAGE 
>
scantime=1.6,size=2221,mid=<[EMAIL PROTECTED]>,autolea
rn=failed 
> 
> 
> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: sa-learn ham

2004-11-24 Thread Ronan

Jim Maul wrote:
Ronan wrote:
so it doesnt make a difference if you have inordinately larger amounts 
of one than the other?? I would have thought it would've worked better 
with more ham...
i read somewhere on the list thats its best to balance.

you'll get conflicting answers to this question.  The only real answer 
that i can tell is "see what works best on your system"  If you get 
significantly more spam than ham, and autolearning is enabled, then you 
will have signficantly more spam tokens than ham.  Over here at the 
hospital where i work, we get significantly more ham than spam so my 
numbers are usually the opposite from everyone elses.  This goes to show 
that even with opposite ratios, the bayes system still works properly. 
This should be argument enough that you dont need to have a balance in 
the number of spam/ham tokens.

on a related note why is my autolearn not funtioning properly???
good question.  Im not sure what the causes of autolearn=failed are. But 
i did happen to notice that you appear to have ALL_TRUSTED firing on 
every email you receive.  There may be larger issues with the setup 
here.  Posting on the list (and not just to me) may provide you will 
more answers as many more people will see the message as well.
ahh yeah hit reply instead of reply-all.
anyone out there see anything major or minorly wrong with the output below??
thanks
ronan
-Jim
bash-2.03$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   1077  0  non-token data: nspam
0.000  0427  0  non-token data: nham
0.000  0 120915  0  non-token data: ntokens
0.000  0 1082126382  0  non-token data: oldest atime
0.000  0 1101307652  0  non-token data: newest atime
0.000  0 1101307670  0  non-token data: last journal 
sync atime
0.000  0 1100189181  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count
bash-2.03$ tail -f /var/log/syslog|grep autolearn
Nov 24 14:56:12 elisha spamd[5125]: result: . -1 - ALL_TRUSTED 
scantime=2.8,size=1755,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:15 elisha spamd[5125]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
scantime=0.6,size=1155,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:27 elisha spamd[6919]: result: .  0 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,MISSING_SUBJECT,NO_REAL_NAME 
scantime=2.8,size=1329,mid=<[EMAIL PROTECTED]>,autolearn=no
Nov 24 14:56:29 elisha spamd[7794]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS 
scantime=2.5,size=1705,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:31 elisha spamd[5467]: result: .  0 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,J_CHICKENPOX_21,J_CHICKENPOX_24,NO_REAL_NAME 
scantime=5.2,size=4798,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:32 elisha spamd[6919]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
scantime=2.5,size=2668,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:34 elisha spamd[7794]: result: . -1 - ALL_TRUSTED 
scantime=0.6,size=32341,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:35 elisha spamd[5467]: result: .  2 - 
FORGED_HOTMAIL_RCVD2,FORGED_RCVD_HELO,MISSING_MIMEOLE,NO_REAL_NAME,PRIORITY_NO_NAME,RCVD_IN_SORBS_DUL 
scantime=0.6,size=78030,mid=<[EMAIL PROTECTED]>,autolearn=no 

Nov 24 14:56:36 elisha spamd[8365]: result: . -1 - 
ALL_TRUSTED,HTML_MESSAGE,HTML_TAG_EXIST_TBODY 
scantime=8.5,size=12218,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:38 elisha spamd[6919]: result: . -1 - ALL_TRUSTED 
scantime=1.1,size=14404,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:38 elisha spamd[5467]: result: . -1 - 
ALL_TRUSTED,HTML_60_70,HTML_MESSAGE 
scantime=1.6,size=2221,mid=<[EMAIL PROTECTED]>,autolearn=failed 



Re: sa-learn ham

2004-11-24 Thread Jim Maul
Ronan wrote:
so it doesnt make a difference if you have inordinately larger amounts 
of one than the other?? I would have thought it would've worked better 
with more ham...
i read somewhere on the list thats its best to balance.

you'll get conflicting answers to this question.  The only real answer 
that i can tell is "see what works best on your system"  If you get 
significantly more spam than ham, and autolearning is enabled, then you 
will have signficantly more spam tokens than ham.  Over here at the 
hospital where i work, we get significantly more ham than spam so my 
numbers are usually the opposite from everyone elses.  This goes to show 
that even with opposite ratios, the bayes system still works properly. 
This should be argument enough that you dont need to have a balance in 
the number of spam/ham tokens.

on a related note why is my autolearn not funtioning properly???
good question.  Im not sure what the causes of autolearn=failed are. 
But i did happen to notice that you appear to have ALL_TRUSTED firing on 
every email you receive.  There may be larger issues with the setup 
here.  Posting on the list (and not just to me) may provide you will 
more answers as many more people will see the message as well.

-Jim
bash-2.03$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   1077  0  non-token data: nspam
0.000  0427  0  non-token data: nham
0.000  0 120915  0  non-token data: ntokens
0.000  0 1082126382  0  non-token data: oldest atime
0.000  0 1101307652  0  non-token data: newest atime
0.000  0 1101307670  0  non-token data: last journal 
sync atime
0.000  0 1100189181  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count
bash-2.03$ tail -f /var/log/syslog|grep autolearn
Nov 24 14:56:12 elisha spamd[5125]: result: . -1 - ALL_TRUSTED 
scantime=2.8,size=1755,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:15 elisha spamd[5125]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
scantime=0.6,size=1155,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:27 elisha spamd[6919]: result: .  0 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,MISSING_SUBJECT,NO_REAL_NAME 
scantime=2.8,size=1329,mid=<[EMAIL PROTECTED]>,autolearn=no
Nov 24 14:56:29 elisha spamd[7794]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS 
scantime=2.5,size=1705,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:31 elisha spamd[5467]: result: .  0 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,J_CHICKENPOX_21,J_CHICKENPOX_24,NO_REAL_NAME 
scantime=5.2,size=4798,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:32 elisha spamd[6919]: result: . -1 - 
ALL_TRUSTED,FROM_ENDS_IN_NUMS,NO_REAL_NAME 
scantime=2.5,size=2668,mid=<[EMAIL PROTECTED]>,autolearn=failed
Nov 24 14:56:34 elisha spamd[7794]: result: . -1 - ALL_TRUSTED 
scantime=0.6,size=32341,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:35 elisha spamd[5467]: result: .  2 - 
FORGED_HOTMAIL_RCVD2,FORGED_RCVD_HELO,MISSING_MIMEOLE,NO_REAL_NAME,PRIORITY_NO_NAME,RCVD_IN_SORBS_DUL 
scantime=0.6,size=78030,mid=<[EMAIL PROTECTED]>,autolearn=no
Nov 24 14:56:36 elisha spamd[8365]: result: . -1 - 
ALL_TRUSTED,HTML_MESSAGE,HTML_TAG_EXIST_TBODY 
scantime=8.5,size=12218,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:38 elisha spamd[6919]: result: . -1 - ALL_TRUSTED 
scantime=1.1,size=14404,mid=<[EMAIL PROTECTED]>,autolearn=failed 

Nov 24 14:56:38 elisha spamd[5467]: result: . -1 - 
ALL_TRUSTED,HTML_60_70,HTML_MESSAGE 
scantime=1.6,size=2221,mid=<[EMAIL PROTECTED]>,autolearn=failed 






Re: sa-learn ham

2004-11-24 Thread Jim Maul
Ronan wrote:

Jim Maul wrote:
Ronan wrote:
hi all.
for those of you running large volume servers you no doubt have an 
abundance of spam to feed into sa-learn, and i suppose that goes for 
all sizes of volumes.
but one question. how do you manage to match the same number with 
hams / real messages. how do you go about bumping up the numbers to 
even the DB? Am i right in saying that basically anymail thats not 
spam is ham or is ham only supposed to be mail that are false 
negatives ie have been tagged but arent really spam.

Attempting to get these numbers equal is an unncessary, and as you've 
discovered, almost futile task.

While i would *not* recommend running on autolearning exclusively, it 
is working incredibly well here with the occasional manual sa-learn 
here and there.  sa-learn --dump magic shows the following for my system:

0.000  0   1105  0  non-token data: nspam
0.000  0  28077  0  non-token data: nham
Jim, isnt your ration of ham:spam 25:1 and not 1:25

Oops, yep your correct, i had the order switched.  Regardless, my point 
still stands :)

-Jim


Re: sa-learn ham

2004-11-24 Thread Ronan

Jim Maul wrote:
Ronan wrote:
hi all.
for those of you running large volume servers you no doubt have an 
abundance of spam to feed into sa-learn, and i suppose that goes for 
all sizes of volumes.
but one question. how do you manage to match the same number with hams 
/ real messages. how do you go about bumping up the numbers to even 
the DB? Am i right in saying that basically anymail thats not spam is 
ham or is ham only supposed to be mail that are false negatives ie 
have been tagged but arent really spam.

Attempting to get these numbers equal is an unncessary, and as you've 
discovered, almost futile task.

While i would *not* recommend running on autolearning exclusively, it is 
working incredibly well here with the occasional manual sa-learn here 
and there.  sa-learn --dump magic shows the following for my system:

0.000  0   1105  0  non-token data: nspam
0.000  0  28077  0  non-token data: nham
Jim, isnt your ration of ham:spam 25:1 and not 1:25
Thats like a 1:25 ratio of ham:spam and it is quite rare that i see any 
bayes scores that arent bayes_0 or bayes_99.  Of course, your mileage 
may and probably will vary.

-Jim
--
Regards
Ronan McGlue
==
Analyst/Programmer
Information Services
Queens University Belfast
BT7 1NN


Re: sa-learn ham

2004-11-24 Thread Jim Maul
Ronan wrote:
hi all.
for those of you running large volume servers you no doubt have an 
abundance of spam to feed into sa-learn, and i suppose that goes for all 
sizes of volumes.
but one question. how do you manage to match the same number with hams / 
real messages. how do you go about bumping up the numbers to even the 
DB? Am i right in saying that basically anymail thats not spam is ham or 
is ham only supposed to be mail that are false negatives ie have been 
tagged but arent really spam.
Attempting to get these numbers equal is an unncessary, and as you've 
discovered, almost futile task.

While i would *not* recommend running on autolearning exclusively, it is 
working incredibly well here with the occasional manual sa-learn here 
and there.  sa-learn --dump magic shows the following for my system:

0.000  0   1105  0  non-token data: nspam
0.000  0  28077  0  non-token data: nham
Thats like a 1:25 ratio of ham:spam and it is quite rare that i see any 
bayes scores that arent bayes_0 or bayes_99.  Of course, your mileage 
may and probably will vary.

-Jim


sa-learn ham

2004-11-24 Thread Ronan
hi all.
for those of you running large volume servers you no doubt have an 
abundance of spam to feed into sa-learn, and i suppose that goes for all 
sizes of volumes.
but one question. how do you manage to match the same number with hams / 
real messages. how do you go about bumping up the numbers to even the 
DB? Am i right in saying that basically anymail thats not spam is ham or 
is ham only supposed to be mail that are false negatives ie have been 
tagged but arent really spam.
here at the university there are 3 admins who if they want could read 
other peoples email... Data protection blah blah but its simply a side 
affect of administering the systems.

putting a random selection of users' HAM emails ( which could be and 
unsurprisingly are personal) into the filter to balance the DB could be 
contentious - but its the only way to get a good selection of emails.

as i said there are only the 3 of us but we have around 4 mail boxes 
and 3 isnt really a good representation in terms of quality of emails to 
be feeding ham into sa-learn. Aside from opening up a mailbox to pleb 
users and creating more havoc, what are the recommended ways of getting 
around this?

thanks
ronan
--
Regards
Ronan McGlue
==
Analyst/Programmer
Information Services
Queens University Belfast
BT7 1NN


Re: sa-learn --ham not running from horde/imp.

2004-10-13 Thread sahil
Quoting Matt Kettler <[EMAIL PROTECTED]>:

> I assume you've got some bayes_path statement in your local.cf forcing SA
> to use that path. Note: if it's set to /root/* I'd suggest changing it to
> /var/amavis/*, unless you want to make root's homedir world-readable.

I do not set bayes_path in local.cf; I will do so tonight, and define it as
/var/amavis/.spamassassin/.

> Note: this process will cause your bayes DB to change ownership randomly.
> It shouldn't matter as long as the file remains world rw, which it should
> with the bayes_file_mode setting in place.

That's fine.  I would like users to "Report as Spam" any messages they deem
nefarious, and then (ideally) SA will use the updated DB to block future spam
with similar attributes.  Per-user DB's are not necessary as I don't want to
throw mysql into the mix just yet.

Thanks for your advice - will try it out tonight and report the results.

--
Sahil Tandon




Re: sa-learn --ham not running from horde/imp.

2004-10-13 Thread Matt Kettler
At 11:37 PM 10/12/2004 -0400, Sahil Tandon wrote:
I googled for the error but cannot find a proper solution.  Right now, 
/root/.spamassassin is a symlink to /var/amavis/.spamassassin; the files 
therein (i.e. the bayes_* files) are chown'd vscan:vscan.  They are 
updated when SA *itself* notices spam above a certain threshold, rejects 
those messages, and auto-learns their spammy existence.

How to get 'sa-learn --spam' from webmail to co-exist peacefully with my 
current setup?
I assume you've got some bayes_path statement in your local.cf forcing SA 
to use that path. Note: if it's set to /root/* I'd suggest changing it to 
/var/amavis/*, unless you want to make root's homedir world-readable.

when doing a single global bayes DB, add the following config option to SA:
 bayes_file_mode 777
Note that it's a mask and it's used in making directories, so it should be 
777 not 666. SA won't make the bayes DB executable, but it will add the x 
bit to temp directories.

From there, be sure the directory is world rwx and the directories above 
are at least world r_x.

Lastly, chmod 666 the bayes_* files.
Note: this process will cause your bayes DB to change ownership randomly. 
It shouldn't matter as long as the file remains world rw, which it should 
with the bayes_file_mode setting in place.



Re: sa-learn --ham not running from horde/imp.

2004-10-13 Thread Thomas Bolioli
What is likely happening is that sa-learn is running as root, with 
nobody's permissions since apache su's itself to nobody by default on RH 
9/FC1 (I am assuming this version of linux from the LC_ALL/LANG issue, 
although mac osx is a possibility). When you click the link in horde, it 
is executing that code ('/usr/local/bin/sa-learn --spam') as the user 
that the webserver is running under (nobody), not the user you are 
logged into horde with (sahil?). Therfore, you are not actually learning 
against the user's (sahil) bayes db. When apache su's to nobody, it 
looses rights to root's resources but su is notorius for not fully 
assuming the su'd (nobody) persona. Part of that is intentional, part of 
it is not. That is why sa-learn is picking up root's id as the one to 
try and run under but fails to have the required perms to accomplish 
anything. Oh yeah, I forgot to elucidate that sa-learn figures out the 
.spamassassin directory from the currently logged in user's home dir as 
reported by the env vars. Those are effected by su'ing to a new user. 
One example is that the 2.6X versions do not run properly under sudo but 
do under su. The reason, although I have never looked into it directly, 
are likely the difference between the behavior of su and sudo in how 
they set env variables.
Possible fixes:
1) don't use hordes reporting util (recommended)
2) run the webserver as root and use the command ('su -c 
/usr/local/bin/sa-learn --spam $user' or something like it. Do man su or 
su --help for more info) where $user is horde's global variable for the 
currently logged on user. (*NOT* recommended. Major security issues there)
3) attempt to run the command from 2 under the su'd nobody user. It may 
work since sometimes su is broken depending on your build of perl, etc 
(although it has been 6 months since I used RH, I do not believe their 
stock perl build was broken). It may revert to the parent processes 
rights instead of seeing nobody and su without a password needed. This 
is highly unlikely of working but it is worth a shot. It will take no 
time to try and can't hurt if it works. Since if it works, you have 
bigger problems that you can do little to fix. Although I stress, I do 
not think it will work.
I hope that helps,
Tom

Sahil Tandon wrote:
I understand my problem might be rooted in Horde, amavisd-new, or 
Postfix.  However, I want to be sure it's not a fundamental 
misunderstanding (on my part) of how SA should be setup.

Postfix filters mail via amavisd-new (which calls SA).  Everything 
runs smoothly except the "Report as Spam" link for users viewing 
messages via webmail.  When clicked, it successfully sends a message 
to postmaster, and unsuccessfully calls sa-learn.  This is what I see 
in my logs (sorry for line wraps):

lock: 94539 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.94539 for 
/root/.spamassassin/bayes.lock: Permission denied
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = (unset),
LANG = "en_US"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
bayes expire_old_tokens: lock: 95562 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.95562 for 
/root/.spamassassin/bayes.lock: Permission denied

lock: 95562 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.95562 for 
/root/.spamassassin/bayes.lock: Permission denied

The relevant line in my IMP conf.php:
$conf['spam']['program'] = '/usr/local/bin/sa-learn --spam';
I googled for the error but cannot find a proper solution.  Right now, 
/root/.spamassassin is a symlink to /var/amavis/.spamassassin; the 
files therein (i.e. the bayes_* files) are chown'd vscan:vscan.  They 
are updated when SA *itself* notices spam above a certain threshold, 
rejects those messages, and auto-learns their spammy existence.

How to get 'sa-learn --spam' from webmail to co-exist peacefully with 
my current setup?

--
Sahil Tandon



sa-learn --ham not running from horde/imp.

2004-10-13 Thread Sahil Tandon
I understand my problem might be rooted in Horde, amavisd-new, or 
Postfix.  However, I want to be sure it's not a fundamental 
misunderstanding (on my part) of how SA should be setup.

Postfix filters mail via amavisd-new (which calls SA).  Everything runs 
smoothly except the "Report as Spam" link for users viewing messages via 
webmail.  When clicked, it successfully sends a message to postmaster, 
and unsuccessfully calls sa-learn.  This is what I see in my logs (sorry 
for line wraps):

lock: 94539 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.94539 for 
/root/.spamassassin/bayes.lock: Permission denied
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = (unset),
LANG = "en_US"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
bayes expire_old_tokens: lock: 95562 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.95562 for 
/root/.spamassassin/bayes.lock: Permission denied

lock: 95562 cannot create tmp lockfile 
/root/.spamassassin/bayes.lock.sphinx.hamla.org.95562 for 
/root/.spamassassin/bayes.lock: Permission denied

The relevant line in my IMP conf.php:
$conf['spam']['program'] = '/usr/local/bin/sa-learn --spam';
I googled for the error but cannot find a proper solution.  Right now, 
/root/.spamassassin is a symlink to /var/amavis/.spamassassin; the files 
therein (i.e. the bayes_* files) are chown'd vscan:vscan.  They are 
updated when SA *itself* notices spam above a certain threshold, rejects 
those messages, and auto-learns their spammy existence.

How to get 'sa-learn --spam' from webmail to co-exist peacefully with my 
current setup?

--
Sahil Tandon