Re: local score ignored
On Thu, 18 Apr 2013, Joe Acquisto-j4 wrote: On 4/18/2013 at 7:21 AM, Matus UHLAR - fantomas wrote: On 18.04.13 06:45, Joe Acquisto-j4 wrote: I was concerned about this: [score: 0.4968] This meant that BAYES has computer 49.56% probability that the mail is spam and the rest (50.44%) that it is HAM. ok DO NOT play with BAYES_50 score. ? What can it hurt? BAYES_50 is the bayes classifier's way of saying "insufficient data" or "I don't know". Do you really want to assign 3 points for "I don't know"? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Ten-millimeter explosive-tip caseless, standard light armor piercing rounds. Why? --- Tomorrow: the 238th anniversary of The Shot Heard 'Round The World
Re: local score ignored
On 2013-04-18 19:45, Joe Acquisto-j4 wrote: DO NOT play with BAYES_50 score. ? What can it hurt? It can cause significant false positives, since BAYES_50 indicates at the there's a 50% chance that this message isn't spam. -- Dave Warren http://www.hireahit.com/ http://ca.linkedin.com/in/davejwarren
Re: local score ignored
>>> On 4/18/2013 at 7:21 AM, Matus UHLAR - fantomas wrote: > On 18.04.13 06:45, Joe Acquisto-j4 wrote: >>I was concerned about this: >> >> [score: 0.4968] > > This meant that BAYES has computer 49.56% probability that the mail is spam > and the rest (50.44%) that it is HAM. ok > Train your bayes database, if you get many spams with this small score. All I can do is feed it. > > DO NOT play with BAYES_50 score. ? What can it hurt? joe a.
sa-exim Terse Rules
I'm new to the list, so if there are web archives that are easily searchable where I can find this info please point me to it. I am running sa-exim with SA 3.3.1. I am trying for the life of me to turn on the Terse report options, so that in the email headers I can see what points are being attributed to each rule. It seems this has changed somewhat from version to version so I can't seem to find anything specifically related to version and sa-exim when googling. TIA. John Traweek CCNA, Sec+ Executive Director, Information Technology Proud PCI Associate for 15 years PCI: the data company Heritage Square . 4835 LBJ Freeway, Suite 1100 . Dallas, TX 75244 . 214.530.0394 Did you know last year, PCI raised over 9 million dollars in donations for our clients? Ask us how! This Email is covered by the Electronic Communications Privacy Act, 18 U.S.C. Sections 2510-2521 and is legally privileged. The information contained in this Email is intended only for . If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distributions or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us by telephone 1.800.395.4724 X160, and destroy the original message.
Re: Need rule to catch lots of font changes
Hi all, just write a single detection rule for FONT face= (rawbody or > uri_detail) and use tflag multiple. > > Then meta this with a counter. > > eg: > rawbody __BLAH / tflags __BLAH multiple maxhits=21 > meta MULTPL_FONTS __BLAH > 20 > score MULTPL_FONTS 5.0 > describe MULTPL_FONTS At least 20 FONT tags found > I'm trying to adapt this to work with multiple tags, but I must be doing something wrong. I've tried changing it to match just 10 instances of , just for testing. Here's what I have: rawbody __LOC_BR // tflags __LOC_BR multiple maxhits=11 meta LOC_MULT_BR > 10 score LOC_MULT_BR 2.0 describe LOC_MULT_BR At least 10 br tags found Here is the body example I'm working with: http://www.paren= ts-partage.org/components/com_content/bestinfo.php?tkogwruam714qhdgbfo">htt= p://www.parents-partage.org/components/com_content/bestinfo.php?tkogwruam71= 4qhdgbfo<= br>= __= __The stresses.. They just don't care. They're like you on Sunday m= orning. -- Jerry Griffin Any idea why this doesn't work as expected? I've pasted an example here: http://pastebin.com/qprT2Rze Thanks for any ideas. Alex > Best regards, > > Alex, from prypiat. > Yes, I recycle. > > > On 13-04-14 08:46 PM, Marc Perkel wrote: > > Anyone want to write a rule to catch this? Lots of font and color > > changes. > > > > > > treatment for the summer holidays. > > http://jmb.tw/16xul";>Achieve all your goals and this video > > will > > help you. > > One > > day > > > color="#e0fffb">a > size="+2" color="#e8fffc">younger colleague, one > face="Courier, monospace" > > size="5" color="#ecfbf9">of > size="3" color="#e0fefa">my most > color="#e0fdf9">intimate > > > color="#f8fffe">friends, > size="-3" color="#f6fdfc">who had visited > face="Arial, Helvetica, sans-serif" size="1" color="#f0fefc">the > > patient- > face="Century Gothic, Times New Roman" > > size="1" color="#e8f6f4">Irma- > color="#e4f2f0">and > size="-2" color="#e8fdfa">her > > > > > > > > > > > >
Re: URL spam and RP_MATCHES_RCVD
Hi, > > we'll continue to monitor the stock values. I didn't realize the > > corpus could lack the volume to get a more accurate calculation. > > It's more a matter of balance and diversity than volume. > Ah, okay, that makes sense. Somewhat related, but can I ask if anyone has rules to score the junk from constantcontact.com or vresp.com or verticalresponse.com? How would that be included with the masschecks, since so much of it is junk, but really classified as marketing emails? Those three domains (and other popular email marketing companies) seem to be a legitimate way for spammers to reach their targets with a free pass. Thanks, Alex
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
Hi, > Curious: what are your reasons for using Bayes in SQL? > > Are you sharing the DB among several machines? Or is this a single > > box/global bayes setup? > > > > > > Not yet, but that is the ultimate plan (to share the DB across multiple > servers). Also, I like the idea that the Bayes DB is backed-up > automatically along with all other databases on the server (we run a > cron script that performs the dump). Granted, it would be trivial to > schedule a call to "sa-learn --backup", but storing the data in SQL > seems more portable and makes it easier to query the data for reporting > purposes. > I have bayes in MySQL now, and I think it performs better than with just a flat file berkeley db. I believe it solved some locking/sharing issues I was having too. I converted to it a few months ago (relearned the corpus from scratch to mysql) with the intention of sharing between three systems, but the network latency and general performance between the systems for updates was horrible so they're all separate databases now. I'm still a mysql novice, so I don't doubt someone with more mysql networking experience could figure out how to share them between systems properly. I thought there would be one master system with two slaves, but instead they all seemed to be shared interactively for every query or update. For the InnoDB/MyISAM issue, if I'm understanding it correctly, I just edited the sql file I used to create the database, and I'm using InnoDB now without any issues on v3.3.2. I believe I used these instructions, with the sql modifications from above: http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html Regards, Alex
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
On 4/18/2013 12:26 PM, Axb wrote: > On 04/18/2013 06:18 PM, Ben Johnson wrote: >> I have done some searching-around on the string "cannot use bayes on >> this message; not enough usable tokens found" and have not found >> anything authoritative regarding what this message might mean and >> whether or not it can be ignored or if it is symptomatic of a larger >> Bayes problem. > > Curious: what are your reasons for using Bayes in SQL? > Are you sharing the DB among several machines? Or is this a single > box/global bayes setup? > > Not yet, but that is the ultimate plan (to share the DB across multiple servers). Also, I like the idea that the Bayes DB is backed-up automatically along with all other databases on the server (we run a cron script that performs the dump). Granted, it would be trivial to schedule a call to "sa-learn --backup", but storing the data in SQL seems more portable and makes it easier to query the data for reporting purposes. Then again, I retain the corpora, so backing-up the DB is only useful for when data needs to be moved from one server or database to another (as moving the corpora seems far less practical). Are you suggesting that I should scrap SQL and go back to a flat-file DB? Is that the only path to a fix (short of upgrading SA)? Thanks for your help! -Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
On 04/18/2013 06:18 PM, Ben Johnson wrote: I have done some searching-around on the string "cannot use bayes on this message; not enough usable tokens found" and have not found anything authoritative regarding what this message might mean and whether or not it can be ignored or if it is symptomatic of a larger Bayes problem. Curious: what are your reasons for using Bayes in SQL? Are you sharing the DB among several machines? Or is this a single box/global bayes setup?
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
On 4/17/2013 10:15 PM, John Hardin wrote: > On Wed, 17 Apr 2013, Ben Johnson wrote: > >> The first post on that page was the key. In particular, adding the >> following to each MySQL "CREATE TABLE" statement: >> >> ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin; > > Please check the SpamAssassin bugzilla to see if this situation is > already mentioned, and if not, add a bug. This seems pretty critical. Mark Martinec opened three reports in relation to this issue (quoted from the archive thread cited in my previous post): [Bug 6624] BayesStore/MySQL.pm fails to update tokens due to MySQL server bug (wrong count of rows affected) https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6624 (^^ Fixed in 3.4 ^^) [Bug 6625] Bayes SQL schema treats bayes_token.token as char instead of binary, fails chset checks https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6625 (^^ Fixed in 3.4 ^^) [Bug 6626] Newer MySQL chokes on TYPE=MyISAM syntax https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6626 (^^ Fixed in 3.4 ^^) My concern now is that I am on 3.3.1, with little control over upgrades. I have read all three bug reports in their entirety and Bug 6624 seems to be a very legitimate concern. To quote Mark in the bug description: > The effect of the bug with SpamAssassin is that tokens are only able > to be inserted once, but their counts cannot increase, leading to > terrible bayes results if the bug is not noticed. Also the conversion > form db fails, as reported by Dave. > > Attached is a patch for lib/Mail/SpamAssassin/BayesStore/MySQL.pm to > provide a workaround for the MySQL server bug, and improved debug logging. How can I discern whether or not this bug does, in fact, affect me? Are my Bayes results being crippled as a result of this bug? > It's possible that there's a good reason the default script still uses > myISAM. If so, the documentation for this fix should at least be easier > to find. > If there is a good reason, I have yet to discern what it might be. The third bug from above (Mark's comments, specifically) imply that there is no particular reason for using MyISAM. I have good reason for wanting to use the InnoDB storage engine, and I have seen no performance hit as a result of so doing. (In fact, performance seems better than with MyISAM in my scripted, once-a-day training setup.) The perfectly acceptable performance I'm observing could be because a) the InnoDB-related resources allocated to MySQL are more than sufficient, b) the schema that I used has a newly-added INDEX whereas those prior to it did not, or c) I was sure to use the "MySQL" module instead of the "SQL" module with my InnoDB setup: bayes_store_module Mail::SpamAssassin::BayesStore::MySQL The bottom line seems to be that for those who have settings like these in their MySQL configurations > default_storage_engine=InnoDB > skip-character-set-client-handshake > collation_server=utf8_unicode_ci > character_set_server=utf8 it is absolutely necessary to include ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin; at the end of each CREATE TABLE statement (otherwise, the MySQL syntax error results and all Bayes SELECT statements fail). In any event, I'm a little concerned because while the majority of messages are now tagged with BAYES_* hits, I am now seeing this debug output on a significant percentage of messages ("cannot use bayes on this message; not enough usable tokens found"): # spamassassin -D -t < /tmp/msg.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)' -- Apr 18 09:15:36.537 [21797] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x4430388), bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL Apr 18 09:15:36.568 [21797] dbg: bayes: using username: amavis Apr 18 09:15:36.568 [21797] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x4779778) Apr 18 09:15:36.580 [21797] dbg: bayes: database connection established Apr 18 09:15:36.580 [21797] dbg: bayes: found bayes db version 3 Apr 18 09:15:36.581 [21797] dbg: bayes: Using userid: 1 Apr 18 09:15:36.781 [21797] dbg: bayes: corpus size: nspam = 6155, nham = 2342 Apr 18 09:15:36.787 [21797] dbg: bayes: tok_get_all: token count: 176 Apr 18 09:15:36.790 [21797] dbg: bayes: cannot use bayes on this message; not enough usable tokens found Apr 18 09:15:36.790 [21797] dbg: bayes: not scoring message, returning undef Apr 18 09:15:37.861 [21797] dbg: timing: total 2109 ms - init: 830 (39.4%), parse: 7 (0.4%), extract_message_metadata: 123 (5.9%), poll_dns_idle: 74 (3.5%), get_uri_detail_list: 2 (0.1%), tests_pri_-1000: 26 (1.3%), compile_gen: 155 (7.4%), compile_eval: 19 (0.9%), tests_pri_-950: 7 (0.3%), tests_pri_-900: 7 (0.3%), tests_pri_-400: 15 (0.7%), check_bayes: 10 (0.5%), tests_pri_0: 1018 (48.3%), dkim_load_modules: 25 (1.2%), check_dkim_signature: 3 (0.2%), check_dkim_adsp: 16 (0.7%), check_spf: 78 (3.7%), check_razo
Re: URL spam and RP_MATCHES_RCVD
On Wed, 17 Apr 2013 19:07:39 -0400 Alex wrote: > we'll continue to monitor the stock values. I didn't realize the > corpus could lack the volume to get a more accurate calculation. It's more a matter of balance and diversity than volume.
Re: Using sa-compile are local rules compiled?
On Wed, 17 Apr 2013 21:40:33 +0100 John Horne wrote: > Hello, > > We are running SpamAssassin 3.3.2 on a CentOS 5.9 server. sa-update > runs via a daily cron job, and we have modified that to run > sa-compile as well. However, there are some questions: > > sa-compile is run without any options. So what I am unsure of is > whether our local rules (in /etc/mail/spamassassin/local-spam.cf) are > compiled as well or not? They are for me, but it's easy enough to test. Find a few *simple* local body rules and grep for them under the "compiled" directory. > The man page for sa-compile says that the '--siteconfigpath' option > defaults to /etc/mail/spamassassin which I assume implies that our > local rules would be compiled? > > If they are, and we want to change our local rules, then I assume we > would have to re-compile all the rules before restarting SpamAssassin? That's the received wisdom, but I've never seen a definitive reason why. Compiled rules are intended to co-exit with non-compiled rules, and from my limited testing, rules behave correctly when they are added, removed or modified without recompiling. However I have seen one case where a rule wasn't working properly and was apparently fixed by a recompile, but I suspect that was a specific bug. Personally, it wouldn't bother me to have compilation deferred for a few hours.
Re: local score ignored
On 18.04.13 06:45, Joe Acquisto-j4 wrote: I was concerned about this: [score: 0.4968] This meant that BAYES has computer 49.56% probability that the mail is spam and the rest (50.44%) that it is HAM. Train your bayes database, if you get many spams with this small score. DO NOT play with BAYES_50 score. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 2B|!2B, that's a question!
Re: local score ignored
18.04.2013 13:45, Joe Acquisto-j4 kirjoitti: On 4/18/2013 at 6:38 AM, Axb wrote: >> On 04/18/2013 12:23 PM, Joe Acquisto-j4 wrote: >> On 4/18/2013 at 6:15 AM, Axb wrote: On 04/18/2013 12:11 PM, Joe Acquisto-j4 wrote: > I'm missing something. > > Find a fair amount of missed SPAM showing, among others: > > * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * > [score: 0.4968] > > Bayes is way too low, in my HO. it's obviously not learning enough of whatever it's not scoring higher... > I am puzzled by the line after it. > > I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. > > Still comes thru with the low score. Am I sticking it in the wrong > place? Or can it not be overridden? I do not recall if the odd > second line was there before I made the change. seems like a phatphingers error: should be: score BAYES_50 3.6 and NOT: score BAYES_50_BODY 3.6 >>> Boy, that phingers guy is a real PITA. More like phathead, as that's what >> I intended. But right after I posted I re-examined and said to myself "self >> . >> . ." >>> Thanks. Might the odd line be a result of that? >> What odd line? report looks normal > I was concerned about this: > > [score: 0.4968] > > On a line by itself. But I see similar in all headers, now that I bother to > look. > > Excuse the morning fog, please. > > joe a. > BAYES_50 means: "Can's say if this is SPAM or HAM". I have the score near for it. For HAM I use negative scores, for SPAM positive. But BAYES_50 is not SPAM, nor HAM. 3.6 is way too much spammy score for it, IMHO. -- There is an old time toast which is golden for its beauty. "When you ascend the hill of prosperity may you not meet a friend." -- Mark Twain signature.asc Description: OpenPGP digital signature
Re: local score ignored
>>> On 4/18/2013 at 6:38 AM, Axb wrote: > On 04/18/2013 12:23 PM, Joe Acquisto-j4 wrote: > On 4/18/2013 at 6:15 AM, Axb wrote: >>> On 04/18/2013 12:11 PM, Joe Acquisto-j4 wrote: I'm missing something. Find a fair amount of missed SPAM showing, among others: * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4968] Bayes is way too low, in my HO. >>> >>> it's obviously not learning enough of whatever it's not scoring higher... >>> I am puzzled by the line after it. I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. Still comes thru with the low score. Am I sticking it in the wrong place? Or can it not be overridden? I do not recall if the odd second line was there before I made the change. >>> >>> seems like a phatphingers error: >>> >>> should be: >>> >>> score BAYES_50 3.6 >>> >>> and NOT: >>> >>> score BAYES_50_BODY 3.6 >> >> Boy, that phingers guy is a real PITA. More like phathead, as that's what > I intended. But right after I posted I re-examined and said to myself "self > . > . ." >> >> Thanks. Might the odd line be a result of that? > > What odd line? report looks normal I was concerned about this: [score: 0.4968] On a line by itself. But I see similar in all headers, now that I bother to look. Excuse the morning fog, please. joe a.
Re: local score ignored
On 04/18/2013 12:23 PM, Joe Acquisto-j4 wrote: On 4/18/2013 at 6:15 AM, Axb wrote: On 04/18/2013 12:11 PM, Joe Acquisto-j4 wrote: I'm missing something. Find a fair amount of missed SPAM showing, among others: * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4968] Bayes is way too low, in my HO. it's obviously not learning enough of whatever it's not scoring higher... I am puzzled by the line after it. I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. Still comes thru with the low score. Am I sticking it in the wrong place? Or can it not be overridden? I do not recall if the odd second line was there before I made the change. seems like a phatphingers error: should be: score BAYES_50 3.6 and NOT: score BAYES_50_BODY 3.6 Boy, that phingers guy is a real PITA. More like phathead, as that's what I intended. But right after I posted I re-examined and said to myself "self . . ." Thanks. Might the odd line be a result of that? What odd line? report looks normal
Re: local score ignored
>>> On 4/18/2013 at 6:15 AM, Axb wrote: > On 04/18/2013 12:11 PM, Joe Acquisto-j4 wrote: >> I'm missing something. >> >> Find a fair amount of missed SPAM showing, among others: >> >> * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * >> [score: 0.4968] >> >> Bayes is way too low, in my HO. > > it's obviously not learning enough of whatever it's not scoring higher... > >> I am puzzled by the line after it. >> >> I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. >> >> Still comes thru with the low score. Am I sticking it in the wrong >> place? Or can it not be overridden? I do not recall if the odd >> second line was there before I made the change. > > seems like a phatphingers error: > > should be: > > score BAYES_50 3.6 > > and NOT: > > score BAYES_50_BODY 3.6 Boy, that phingers guy is a real PITA. More like phathead, as that's what I intended. But right after I posted I re-examined and said to myself "self . . ." Thanks. Might the odd line be a result of that? joea.
Re: local score ignored
On 04/18/2013 12:11 PM, Joe Acquisto-j4 wrote: I'm missing something. Find a fair amount of missed SPAM showing, among others: * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4968] Bayes is way too low, in my HO. it's obviously not learning enough of whatever it's not scoring higher... I am puzzled by the line after it. I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. Still comes thru with the low score. Am I sticking it in the wrong place? Or can it not be overridden? I do not recall if the odd second line was there before I made the change. seems like a phatphingers error: should be: score BAYES_50 3.6 and NOT: score BAYES_50_BODY 3.6
local score ignored
I'm missing something. Find a fair amount of missed SPAM showing, among others: * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4968] Bayes is way too low, in my HO. I am puzzled by the line after it. I set local.cf with: score BAYES_50_BODY 3.6 and restarted sa. Still comes thru with the low score. Am I sticking it in the wrong place? Or can it not be overridden? I do not recall if the odd second line was there before I made the change.
Re: URL spam and RP_MATCHES_RCVD
Hello Kris, Monday, April 15, 2013, 8:34:55 PM, you wrote: KD> There seems to be a lame server: Still is! dig +short 2.3.3.updates.spamassassin.org txt @ns.hyperreal.org. "1462428" dig +short 2.3.3.updates.spamassassin.org txt @a.auth-ns.sonic.net. "1468800" -- Best regards, Niamhmailto:ni...@fullbore.co.uk pgpqERxEMYtwC.pgp Description: PGP signature