Re: sa-learn --ham ground rules
Am 2008-02-13 05:14:38, schrieb Karsten Bräckelmann: On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote: Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Nope. The command fragment Gene cared to show us did *not* have any wildcard, but a dir. No bash filename expansion, no limit exceeded. Right, bu if I run 'sa-learn --spam --dir ...' sa-learn exit with an error message that I have tried to scan to many messages (something similar) which mean, perl had exceed the limits. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: sa-learn --ham ground rules
Am 2008-02-13 10:04:36, schrieb Matus UHLAR - fantomas: you can just provide te directory name. sa-learn will then scan the directory w/o args limit Sory, but I use --dir since ages and if I have over 1200-1400 messages sa-learn exit with an error message that I have exceed the limits... Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: sa-learn --ham ground rules
Am 2008-02-08 01:49:52, schrieb Gene Heskett: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? I vaguely recall feeding it my corpus of another folder it was having trouble with a year ago, the linux-usb list, 600 to 1k messages in it and it was finished in an hour that time. On 10.02.08 21:31, Michelle Konzack wrote: Many programs including rm, mv, ls and sa-learn have a limit in the commandline options which is arround 1200 to 1400. it's not limit of those programs, it's system limit of how much data can be passed in arguments. And they don't exit, it's the shell who fails to execute such process Note: sa-learn exits automaticaly if you feed to many messages to it. no. sa-learn afaik doesn't count that. -- Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut.
Re: sa-learn --ham ground rules
Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. On 10.02.08 21:34, Michelle Konzack wrote: Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. you can just provide te directory name. sa-learn will then scan the directory w/o args limit -- Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. M$ Win's are shit, do not use it !
Re: sa-learn --ham ground rules
Am 2008-02-08 01:49:52, schrieb Gene Heskett: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? I vaguely recall feeding it my corpus of another folder it was having trouble with a year ago, the linux-usb list, 600 to 1k messages in it and it was finished in an hour that time. Many programs including rm, mv, ls and sa-learn have a limit in the commandline options which is arround 1200 to 1400. So I would never try to feed more then 1000 messages/files at once to it. Note: sa-learn exits automaticaly if you feed to many messages to it. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSN LinuxMichi 0033/6/6192519367100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: sa-learn --ham ground rules
On Tue, 12 Feb 2008, Gene Heskett wrote: On Sunday 10 February 2008, Michelle Konzack wrote: Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant Guilty, its all in Mail dir format. xargs then? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...in the 2nd amendment the right to arms clause means you have the right to choose how many arms you want, and the militia clause means that Congress can punish you if the answer is none. -- David Hardy, 2nd Amendment scholar --- Today: Abraham Lincoln's and Charles Darwin's 199th Birthdays
Re: sa-learn --ham ground rules
On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote: Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Nope. The command fragment Gene cared to show us did *not* have any wildcard, but a dir. No bash filename expansion, no limit exceeded. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: sa-learn --ham ground rules
On Tue, Feb 12, 2008 at 04:04:28PM -0500, Gene Heskett wrote: Guilty, its all in Mail dir format. --dir ? -- Randomly Selected Tagline: There are all of these warnings and incantations and unnatural rituals and everything's veiled in this threat of you mess with the mayo, the mayo mess with you, man. - Alton Brown, Good Eats, Mayo Clinc pgpjmc9m78Nyz.pgp Description: PGP signature
Re: sa-learn --ham ground rules
On Sunday 10 February 2008, Michelle Konzack wrote: Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant Guilty, its all in Mail dir format. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) new, adj.: Different color from previous model.
Re: sa-learn --ham ground rules
Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSN LinuxMichi 0033/6/6192519367100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: sa-learn --ham ground rules
On Tuesday 12 February 2008, John Hardin wrote: On Tue, 12 Feb 2008, Gene Heskett wrote: On Sunday 10 February 2008, Michelle Konzack wrote: Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann: So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. Yeah, you can even feed 200.000 spams from the bebian lists to it IF YOU USE A MAILBOX FILE. But the OP seems to use Maildir or MH which is slidely different and he seems to exceed the ARGS limit. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant Guilty, its all in Mail dir format. xargs then? Looks interesting if I can grok how to use it. Another day though, my plate runneth over till the weekend now. Thanks John. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) What an author likes to write most is his signature on the back of a cheque. -- Brendan Francis
Re: sa-learn --ham ground rules
On Saturday 09 February 2008, jdow wrote: From: John Hardin [EMAIL PROTECTED] Sent: Friday, 2008, February 08 21:03 Gene Heskett sez: running as root since RH5.1. Yeah, I'm an un-repentant old fart. There's no fool like an old fool. I'm close enough to Gene's age and have known him long enough I get the right to rap his knuckles. Hm, in about a year that advances a step to rap his knuckles with an iron bar? {^_-} Ouch, that would hurt my arthritic joints something terrible. Can it wait till I've had a chance to hit my thumbs with another cortisone shot? On second thought, the iron bar is less painful in the short term. The last time I checked, they wanted to do surgery at $5k per thumb and I said how about cortisone? He said then (15 years ago) that it was $60 a shot, and it would hurt like hell. He was right on both counts, but that thumb still works today. Now its the other ones turn I guess. :) -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) FORCE YOURSELF TO RELAX!
Re: sa-learn --ham ground rules
Gene Heskett wrote: On Saturday 09 February 2008, jdow wrote: From: John Hardin [EMAIL PROTECTED] Sent: Friday, 2008, February 08 21:03 Gene Heskett sez: running as root since RH5.1. Yeah, I'm an un-repentant old fart. There's no fool like an old fool. I'm close enough to Gene's age and have known him long enough I get the right to rap his knuckles. Hm, in about a year that advances a step to rap his knuckles with an iron bar? {^_-} Ouch, that would hurt my arthritic joints something terrible. Can it wait till I've had a chance to hit my thumbs with another cortisone shot? On second thought, the iron bar is less painful in the short term. The last time I checked, they wanted to do surgery at $5k per thumb and I said how about cortisone? He said then (15 years ago) that it was $60 a shot, and it would hurt like hell. He was right on both counts, but that thumb still works today. Now its the other ones turn I guess. :) hmm. hurt like hell? I think that's very Dr. specific. I got a shot that was eased in slowly, front loaded with lidocane, back loaded with cortisone. It was almost painless, the pain I was experiencing before the shot disappeared almost immediately due to the lidocane, and then disappeared in a more ongoing basis due to the cortisone. Magic. --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst [EMAIL PROTECTED] --- Erdös 4
Re: sa-learn --ham ground rules
On Friday 08 February 2008, Karsten Bräckelmann wrote: On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote: The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've dropped 20 doofus mails in the spam directory and fire it off, I have it done and kmail is back among the living in 2-3 minutes. This seems *way* too high. If there have been only 20 messages total in that folder, sa-learn should have processed these in a few *seconds* or less. But, feeding it a 'ham' directory with about 7k messages in it, turned sa-learn into a 100% cpu hog, [...] What did you expect? Based on your numbers above, processing that folder would have taken 10-20 *hours*... incrementing the message processed number only about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I must have fed it a kill -9 50 times. Hmm. Kmail doesn't start one process per mail by any chance? So what is the maximum number of files in a directory that one can feed to sa-learn --ham and expect it to achieve normal speed? Dunno if there are limitations -- however, your 7k messages should be perfectly fine. Just ran a test on a 6k messages mbox file, and there was no noticeable difference to a 30 messages test. The command that kmail issues to it is: sa-learn --ham /root/Mail/(foldername)/cur You're not using root as your ordinary user account, do you !? guenther In fact I do, but I have myself somewhat in a sandbox as all the mail handling stuff except kmail runs as an unprivileged user, and kmail pulls incoming from that mailbox in /var. I've been doing that for about 2-3 of years, started it back at FC2. And running as root since RH5.1. Yeah, I'm an un-repentant old fart. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) CPU needs recalibration
Re: sa-learn --ham ground rules
From: Gene Heskett [EMAIL PROTECTED] Sent: Friday, 2008, February 08 16:43 On Friday 08 February 2008, Karsten Bräckelmann wrote: On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote: The command that kmail issues to it is: sa-learn --ham /root/Mail/(foldername)/cur You're not using root as your ordinary user account, do you !? guenther In fact I do, but I have myself somewhat in a sandbox as all the mail handling stuff except kmail runs as an unprivileged user, and kmail pulls incoming from that mailbox in /var. I've been doing that for about 2-3 of years, started it back at FC2. And running as root since RH5.1. Yeah, I'm an un-repentant old fart. Gene, how many times have I told you don't do that with Linux? Old fart or not, you broke it. Now you get to fix it. You declared you run all mail handling as an unprivileged user. Then you try to run sa-learn as root on root's mailbox. There is rather a problem there if you think about it. Sit and ruminate a few minutes. Seriously - think. Have you considered what you are doing and where at least one hyper-obvious problem lies? If you're not banging your forehead and screaming Doh! by now here is the clue bat. If you've figured it out, DUCK! SpamAssassin will be using a Bayes database collected as that unprivileged user. It cannot use one generated as root and placed in root's directory structure. The last I knew you were trying to use per user Bayes. So sa-learn as root will build the file in a place the unprivileged process cannot access AND will likely leave the file with privileges that prevent access by that unprivileged user. That's issue one for you to fix. And if you don't fix your errant ways mama's gonna whup you good. What size machine are you trying to work on? How deep into your swap file are you when you run sa-learn? {^_^} Joanne, ashamed I've known you all these years, Gene. You shame me by not taking advice repeated virtually every time we communicate.
Re: sa-learn --ham ground rules
On Friday 08 February 2008, jdow wrote: From: Gene Heskett [EMAIL PROTECTED] Sent: Friday, 2008, February 08 16:43 On Friday 08 February 2008, Karsten Bräckelmann wrote: On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote: The command that kmail issues to it is: sa-learn --ham /root/Mail/(foldername)/cur You're not using root as your ordinary user account, do you !? guenther In fact I do, but I have myself somewhat in a sandbox as all the mail handling stuff except kmail runs as an unprivileged user, and kmail pulls incoming from that mailbox in /var. I've been doing that for about 2-3 of years, started it back at FC2. And running as root since RH5.1. Yeah, I'm an un-repentant old fart. Gene, how many times have I told you don't do that with Linux? Old fart or not, you broke it. Now you get to fix it. Ya mean I get to keep all the pieces? Oh goodie. You declared you run all mail handling as an unprivileged user. Then you try to run sa-learn as root on root's mailbox. There is rather a problem there if you think about it. Sit and ruminate a few minutes. Seriously - think. Have you considered what you are doing and where at least one hyper-obvious problem lies? If you're not banging your forehead and screaming Doh! by now here is the clue bat. If you've figured it out, DUCK! Quack? SpamAssassin will be using a Bayes database collected as that unprivileged user. It cannot use one generated as root and placed in root's directory structure. The last I knew you were trying to use per user Bayes. That I wouldn't bet on, but spamassassins kids are running as gene, called into service by procmail also running as gene so I'd have to assume the applicable bayes database its using is the one in /home/gene. So sa-learn as root will build the file in a place the unprivileged process cannot access AND will likely leave the file with privileges that prevent access by that unprivileged user. That's issue one for you to fix. And if you don't fix your errant ways mama's gonna whup you good. Good, when you get done I'll buy. What size machine are you trying to work on? How deep into your swap file are you when you run sa-learn? xp-2800, a gig of ram, 2 of swap. Swap is very rarely touched. {^_^} Joanne, ashamed I've known you all these years, Gene. You shame me by not taking advice repeated virtually every time we communicate. It figures, Joanne would have to see how long she can balance on the soap box. ;-) At least here in Weston, our 'free speech stump', the stump of a 4+ foot diameter Grand Old Man that was here probably 50 feet tall when the war between the states was a current event, has a guard rail now, and since they left it about 4 feet tall, has a set of steps so even an old fart like me can make it up onto it should I feel the urge to make a speech. However, I see what you are saying, both about perms, and locations. Excellent points, I'll see what I can figure out toward making that database belong to me instead of root. Obviously I didn't carry that conversion to user near far enough, so I deserve the knuckle rap. How about I change that kmail filter rule to use: runcon -l gene sa-learn --spam /path/to/spam and: runcon -l gene sa-learn --ham /path/to/ham Now, I note that the /home/gene/.spamassassin/bayes* stuff is carrying a very current time stamp, # ls -l total 53332 -rw--- 1 gene gene 20983808 2008-02-08 23:27 auto-whitelist -rw-rw-rw- 1 gene gene6 2008-01-03 02:37 auto-whitelist.mutex -rw--- 1 gene gene26616 2008-02-08 23:27 bayes_journal -rw-rw-rw- 1 gene gene 147750 2008-01-03 02:37 bayes.mutex -rw--- 1 gene gene 41889792 2008-02-08 23:27 bayes_seen -rw--- 1 gene gene 5292032 2008-02-08 23:27 bayes_toks -rw-r--r-- 1 gene gene 934 2005-12-14 16:58 init.pre -rw-r--r-- 1 gene gene 1164 2006-01-16 13:45 user_prefs -rw-r--r-- 1 gene gene 2397 2005-12-14 16:58 v310.pre so apparently it is doing some self-learning? Many thanks girl. I will get it sorted. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper.
Re: sa-learn --ham ground rules
Gene Heskett sez: running as root since RH5.1. Yeah, I'm an un-repentant old fart. There's no fool like an old fool. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #20: The faster you finish the fight, the less shot you will get. --- 4 days until Abraham Lincoln's and Charles Darwin's 199th Birthdays
Re: sa-learn --ham ground rules
On Saturday 09 February 2008, John Hardin wrote: Gene Heskett sez: running as root since RH5.1. Yeah, I'm an un-repentant old fart. There's no fool like an old fool. And that's why they pay me the big bucks when something really goes aglay at the tv station even if I have been semi-retired since mid 2002. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Lost: gray and white female cat. Answers to electric can opener.
Re: sa-learn --ham ground rules
From: John Hardin [EMAIL PROTECTED] Sent: Friday, 2008, February 08 21:03 Gene Heskett sez: running as root since RH5.1. Yeah, I'm an un-repentant old fart. There's no fool like an old fool. I'm close enough to Gene's age and have known him long enough I get the right to rap his knuckles. Hm, in about a year that advances a step to rap his knuckles with an iron bar? {^_-}