Re: [sniffer] Possible blip?
At 01:42 PM 5/21/2004, you wrote: Pete, Our Hold range has returned to more normal territory on Thursday. Here's the stats from One of my thoughts regarding minimum rule strengths and grace periods is that all groups aren't necessarily the same. For instance Nigerian scams are low volume and sporadic, and my system performs the worst on these things. Maybe lower rule strengths and longer grace periods makes much more sense for the Phishing category than it does for many other categories for instance. Is that possible? These are definitely some things to look at - great food for new research projects. There is a great diversity - luckily the scanning engine has a huge amount of headroom so most of the time we don't need to tune things very precisely. In any of the categories you mention we see some rules die immediately, and others seem to live on forever - often without a great deal of reason for either case. The fact that your hold range returned after we adjusted the rule strength calculation window is a good indication that the relevant tuning parameter is minimum rule strength. I noted that the previous adjustment (changing the window from 45 to 35 days) happened precisely one month ago. This strongly suggested that we were seeing a "wave front" of sorts pass through the tuning system - so on a hunch I put it back to 45. Your report helps to support this conjecture. The grace period value has the greatest effect early on in a rule's life cycle and probably shouldn't be extended beyond about 10 days. The design of the grace period feature is that it gives a new rule time for it's rule strength to rise to the minimum threshold. After that it's all about the performance of the rule. This sets up a competitive environment in the system. Reaching a threshold of 1.0 currently requires that at least 19 messages fail on that rule within the analysis window and on one of the systems that are providing logs for analysis. With about 110 logs being consistently reported there are plenty of chances for 19 hits to happen. [ an "ordinary" reporting system processes about 1300 messages per hour with sniffer spending about 190ms of computing time per message (or about 7% of the available computing time). In 5 days a rule has about 1716 opportunities to "kill" a message. To stay alive, a rule need only achieve a kill about .00011655% (one ten thousandth of a percent) of the time. Of course, these numbers are a lot like the average US family having 2.3 kids - ever seen .3 of a kid? --- but the scale of the numbers seems right. ] It could be argued that if a rule can't account for at least that many hits across 110 systems in 5 days then it's not going to be missed... The counter to this argument is that the spammers are driving toward diversity to make filtering systems of all types difficult to train and maintain -- as you noted, half of the active rules in the default configuration are in this very low strength range. I also looked up the rule strengths on your site and found that about 50%, or maybe more, have a strength below 1, and maybe lowering that is worth testing out so long as I don't massively increase the number of records. I do think though that I would like to test out extending the grace period. Most of my false positives are not on things that this would affect, and that might give niche sources a little extra coverage if I understand things correctly. Possibly - but I think an adjustment in the minimum rule strength will probably suffice given the sensitivity at that range. For example, if you adjust your minimum rule strength to 0.8 then on 10 credited kills would be required over a period of 5 days on 110 systems in order to push the rule above the strength threshold. Thereafter it would remain in place for at least 45 days (with the current settings) --- each of those days providing another opportunity to increase or maintain it's strength... There is also another mechanism at work here --- our core system scans every presumed ham message one more time with every rule in the system (min rule strength 0). The log from this scan is injected into the normal analysis so that if a message matching a deactivated rule reaches our system through any path the strength for that rule will be raised above 0. The second stage of the reactivation process then kicks in because our system normally scans messages with a minimum rule strength of 0.1 - so any messages that were being missed will continue to rise in strength if they are seen in any volume in our spam traps or submitted spam. Once we see 20 instances every system will begin using the reactivated rule... Some systems will begin even before that because they are using more sensitive settings in their rulebases - this fact helps to accelerate the process. Anyway, a long story short - I think the first thing to try is adjusting the Minimum Rule Strength. This is by far the most sensitive setting - though the two do interact dynamically - es
Re: [sniffer] Possible blip?
Interesting. Are you searching for 2 character pairs with GB2312? Scott Fisher Director of IT Farm Progress Companies >>> [EMAIL PROTECTED] 05/21/04 01:46PM >>> Scott, Regarding my Cyrillic and Chinese filters, I did a review of a full week's held spam, looking for foreign languages and patterns to tag. I found from other research that the primary Chinese characterset, GB2312, contains the Western Latin characterset, and so someone could send an E-mail with this characterset defined and still have English as the message. Because of this I do more than just look for the offending characterset, I've built a combo filter that looks for both high bit characters such as ¥ as well as body or header hits for encoding of GB2312 (Chinese/Korean) or Windows-1251 (Cyrillic). I also have Declude END statements for appearances of US-ASCII and ISO-8859-1, so messages like this one that are referencing such patterns won't trip the filter. It seems to be stopping about 80% to 90% of the stuff, but I'm guessing that the stuff that is getting through didn't hit one of the high bit characters in my filter and I might need to simply expand my list a bit. Unfortunately I have no idea what characters are most common, so I'm just eyeballing it from sources. I had one false positive on a Yahoo Groups posting that referenced 163.com, a Chinese free Web mail provider that inserts Chinese language footers. The message was in English, but encoded in GB2312 and didn't indicate any sign of English besides the actual text. Because of this, I might throw in an exception for the word "the " (followed by a space) just as a test to see if text in English is present, but I have to review that. This message was also BASE64 encoded and that might be an appropriate exception??? The last pattern that I might look at is using the new MailPolice test for identifying Web-mail providers, and excepting them from the filter because they have issues with encoding languages I've found. Hope this helps. Matt Scott Fisher wrote: >2 thoughts from me: > >1. Right on on the Nigerian scams, possible keeping these rules longer. As I was >forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the >Nigerian rules were kept in play. I might also add Nigeria's twin sister the >International Lottery spam and Stock Spams might also be kept longer. I noticed an >increase in the Stock spams this week. > >2. I've been tracking different character sets for a couple of weeks, the Chinese, >Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese >Headers. > >Scott Fisher >Director of IT >Farm Progress Companies > > > [EMAIL PROTECTED] 05/21/04 12:42PM >>> >Pete, > >Our Hold range has returned to more normal territory on Thursday. >Here's the stats from the week as a whole on what has been very >consistent traffic. Out of all E-mail processed, both good and bad, the >%Hold represents what scored between 10-24 points on our system and >needed review, the %Sniffer represents all Sniffer hits except for Gray, >the %Spam is what we scanned and didn't deliver (generally about 99.8% >of spam is caught at a score of 10 which this is based on), and the >Sniffer/Spam is the percentage of Sniffer hits as a portion of messages >scoring 10 or more. > >Day %Hold%Sniffer%SpamSniffer/Spam >Mon: 1.86% 77.27% 80.37% 96.14% >Tue: 2.83% 74.53% 79.37% 93.39% >Wed: 2.13% 77.60% 79.66% 97.41% >Thur:1.95% 76.50% 80.66% 94.84% > >The only change that we made to our system was to add two smaller >domains later in the week, and we introduced filters for Cyrillic and >Chinese languages on Wednesday morning which have cut our hold file down >by 0.38 percentage points on Thursday, which explains how our %Hold is >lower on than on Wednesday with a lower Sniffer hit rate on spam. > >I did note two high volume untagged static spammers on Tuesday that we >blacklisted locally, and that combined with the increase in Sniffer >change rates (spam storm) might account for the changes that I saw. I >am wondering though about the recommendations that you have made for >possibly fine tuning our rule base. Again though, please keep in mind >that I still feel that performance is overall very, very good. > >One of my thoughts regarding minimum rule strengths and grace periods is >that all groups aren't necessarily the same. For instance Nigerian >scams are low volume and sporadic, and my system performs the worst on >these things. Maybe lower rule strengths and longer grace periods makes >much more sense for the Phishing category than it does for many other >categories for instance. Is that possible? > >I also looked up the rule strengths on your site and found that about >50%, or maybe more, have a strength below 1, and maybe lowering that is >worth testing out so
Re: [sniffer] Possible blip?
Scott, Regarding my Cyrillic and Chinese filters, I did a review of a full week's held spam, looking for foreign languages and patterns to tag. I found from other research that the primary Chinese characterset, GB2312, contains the Western Latin characterset, and so someone could send an E-mail with this characterset defined and still have English as the message. Because of this I do more than just look for the offending characterset, I've built a combo filter that looks for both high bit characters such as ¥ as well as body or header hits for encoding of GB2312 (Chinese/Korean) or Windows-1251 (Cyrillic). I also have Declude END statements for appearances of US-ASCII and ISO-8859-1, so messages like this one that are referencing such patterns won't trip the filter. It seems to be stopping about 80% to 90% of the stuff, but I'm guessing that the stuff that is getting through didn't hit one of the high bit characters in my filter and I might need to simply expand my list a bit. Unfortunately I have no idea what characters are most common, so I'm just eyeballing it from sources. I had one false positive on a Yahoo Groups posting that referenced 163.com, a Chinese free Web mail provider that inserts Chinese language footers. The message was in English, but encoded in GB2312 and didn't indicate any sign of English besides the actual text. Because of this, I might throw in an exception for the word "the " (followed by a space) just as a test to see if text in English is present, but I have to review that. This message was also BASE64 encoded and that might be an appropriate exception??? The last pattern that I might look at is using the new MailPolice test for identifying Web-mail providers, and excepting them from the filter because they have issues with encoding languages I've found. Hope this helps. Matt Scott Fisher wrote: 2 thoughts from me: 1. Right on on the Nigerian scams, possible keeping these rules longer. As I was forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the Nigerian rules were kept in play. I might also add Nigeria's twin sister the International Lottery spam and Stock Spams might also be kept longer. I noticed an increase in the Stock spams this week. 2. I've been tracking different character sets for a couple of weeks, the Chinese, Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese Headers. Scott Fisher Director of IT Farm Progress Companies [EMAIL PROTECTED] 05/21/04 12:42PM >>> Pete, Our Hold range has returned to more normal territory on Thursday. Here's the stats from the week as a whole on what has been very consistent traffic. Out of all E-mail processed, both good and bad, the %Hold represents what scored between 10-24 points on our system and needed review, the %Sniffer represents all Sniffer hits except for Gray, the %Spam is what we scanned and didn't deliver (generally about 99.8% of spam is caught at a score of 10 which this is based on), and the Sniffer/Spam is the percentage of Sniffer hits as a portion of messages scoring 10 or more. Day %Hold%Sniffer%SpamSniffer/Spam Mon: 1.86% 77.27% 80.37% 96.14% Tue: 2.83% 74.53% 79.37% 93.39% Wed: 2.13% 77.60% 79.66% 97.41% Thur:1.95% 76.50% 80.66% 94.84% The only change that we made to our system was to add two smaller domains later in the week, and we introduced filters for Cyrillic and Chinese languages on Wednesday morning which have cut our hold file down by 0.38 percentage points on Thursday, which explains how our %Hold is lower on than on Wednesday with a lower Sniffer hit rate on spam. I did note two high volume untagged static spammers on Tuesday that we blacklisted locally, and that combined with the increase in Sniffer change rates (spam storm) might account for the changes that I saw. I am wondering though about the recommendations that you have made for possibly fine tuning our rule base. Again though, please keep in mind that I still feel that performance is overall very, very good. One of my thoughts regarding minimum rule strengths and grace periods is that all groups aren't necessarily the same. For instance Nigerian scams are low volume and sporadic, and my system performs the worst on these things. Maybe lower rule strengths and longer grace periods makes much more sense for the Phishing category than it does for many other categories for instance. Is that possible? I also looked up the rule strengths on your site and found that about 50%, or maybe more, have a strength below 1, and maybe lowering that is worth testing out so long as I don't massively increase the number of records. I do think though that I would like to test out extending the grace period. Most of my false positives are not on things that this would affect, and that migh
Re: [sniffer] Possible blip?
2 thoughts from me: 1. Right on on the Nigerian scams, possible keeping these rules longer. As I was forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the Nigerian rules were kept in play. I might also add Nigeria's twin sister the International Lottery spam and Stock Spams might also be kept longer. I noticed an increase in the Stock spams this week. 2. I've been tracking different character sets for a couple of weeks, the Chinese, Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese Headers. Scott Fisher Director of IT Farm Progress Companies >>> [EMAIL PROTECTED] 05/21/04 12:42PM >>> Pete, Our Hold range has returned to more normal territory on Thursday. Here's the stats from the week as a whole on what has been very consistent traffic. Out of all E-mail processed, both good and bad, the %Hold represents what scored between 10-24 points on our system and needed review, the %Sniffer represents all Sniffer hits except for Gray, the %Spam is what we scanned and didn't deliver (generally about 99.8% of spam is caught at a score of 10 which this is based on), and the Sniffer/Spam is the percentage of Sniffer hits as a portion of messages scoring 10 or more. Day %Hold%Sniffer%SpamSniffer/Spam Mon: 1.86% 77.27% 80.37% 96.14% Tue: 2.83% 74.53% 79.37% 93.39% Wed: 2.13% 77.60% 79.66% 97.41% Thur:1.95% 76.50% 80.66% 94.84% The only change that we made to our system was to add two smaller domains later in the week, and we introduced filters for Cyrillic and Chinese languages on Wednesday morning which have cut our hold file down by 0.38 percentage points on Thursday, which explains how our %Hold is lower on than on Wednesday with a lower Sniffer hit rate on spam. I did note two high volume untagged static spammers on Tuesday that we blacklisted locally, and that combined with the increase in Sniffer change rates (spam storm) might account for the changes that I saw. I am wondering though about the recommendations that you have made for possibly fine tuning our rule base. Again though, please keep in mind that I still feel that performance is overall very, very good. One of my thoughts regarding minimum rule strengths and grace periods is that all groups aren't necessarily the same. For instance Nigerian scams are low volume and sporadic, and my system performs the worst on these things. Maybe lower rule strengths and longer grace periods makes much more sense for the Phishing category than it does for many other categories for instance. Is that possible? I also looked up the rule strengths on your site and found that about 50%, or maybe more, have a strength below 1, and maybe lowering that is worth testing out so long as I don't massively increase the number of records. I do think though that I would like to test out extending the grace period. Most of my false positives are not on things that this would affect, and that might give niche sources a little extra coverage if I understand things correctly. I'll follow your directions and contact you directly regarding any affirmative changes, but I thought it might be beneficial to keep this discussion public since some other stats hounds might find this information to be of use :) If you can glean anything from the numbers that I gave you, please add your thoughts. Thanks, Matt Pete McNeil wrote: > At 05:00 PM 5/19/2004, you wrote: > > > >> I haven't yet upgraded to the most recent release, I'm still on the >> prior beta. I'll probably do that this evening. I tend to wait on >> upgrades until there has been enough time for bugs to surface unless >> I am already looking for a fix. I'm sure that the extra verification >> of the rulebase will help prevent the potential of problems, and I >> guess this has the possibility of being caused by a bit of corrupted >> data, though that's probably reaching. > > > There were no substantive changes from the beta to the production > version. Largely just a removal of monitoring code. > >> Again, regardless if there was a blip, Sniffer still does a wonderful >> job of tagging lots and lots of E-mail, just not quite as much as the >> day before. > > > Last night I was able to adjust the rule strength analysis window back > to it's original settings. About 5 days of data were lost - but those > days will be recovered quickly. Please let me know if this adjustment > improved your conditions. > > I've noted that on a number of other lists there seem to be posts > about a sudden increase in spam over the past few days. We are > definitely seeing this also - approximately a 25% or more increase in > new rule additions in the past 4 days: > > http://www.sortmonster.com/MessageSniffer/Performance/ChangeRates.jsp > > Specifically note from about 4 days ago... > > >Days Ago Adjustments >
Re: [sniffer] Possible blip?
Pete, Our Hold range has returned to more normal territory on Thursday. Here's the stats from the week as a whole on what has been very consistent traffic. Out of all E-mail processed, both good and bad, the %Hold represents what scored between 10-24 points on our system and needed review, the %Sniffer represents all Sniffer hits except for Gray, the %Spam is what we scanned and didn't deliver (generally about 99.8% of spam is caught at a score of 10 which this is based on), and the Sniffer/Spam is the percentage of Sniffer hits as a portion of messages scoring 10 or more. Day %Hold %Sniffer %Spam Sniffer/Spam Mon: 1.86% 77.27% 80.37% 96.14% Tue: 2.83% 74.53% 79.37% 93.39% Wed: 2.13% 77.60% 79.66% 97.41% Thur: 1.95% 76.50% 80.66% 94.84% The only change that we made to our system was to add two smaller domains later in the week, and we introduced filters for Cyrillic and Chinese languages on Wednesday morning which have cut our hold file down by 0.38 percentage points on Thursday, which explains how our %Hold is lower on than on Wednesday with a lower Sniffer hit rate on spam. I did note two high volume untagged static spammers on Tuesday that we blacklisted locally, and that combined with the increase in Sniffer change rates (spam storm) might account for the changes that I saw. I am wondering though about the recommendations that you have made for possibly fine tuning our rule base. Again though, please keep in mind that I still feel that performance is overall very, very good. One of my thoughts regarding minimum rule strengths and grace periods is that all groups aren't necessarily the same. For instance Nigerian scams are low volume and sporadic, and my system performs the worst on these things. Maybe lower rule strengths and longer grace periods makes much more sense for the Phishing category than it does for many other categories for instance. Is that possible? I also looked up the rule strengths on your site and found that about 50%, or maybe more, have a strength below 1, and maybe lowering that is worth testing out so long as I don't massively increase the number of records. I do think though that I would like to test out extending the grace period. Most of my false positives are not on things that this would affect, and that might give niche sources a little extra coverage if I understand things correctly. I'll follow your directions and contact you directly regarding any affirmative changes, but I thought it might be beneficial to keep this discussion public since some other stats hounds might find this information to be of use :) If you can glean anything from the numbers that I gave you, please add your thoughts. Thanks, Matt Pete McNeil wrote: At 05:00 PM 5/19/2004, you wrote: I haven't yet upgraded to the most recent release, I'm still on the prior beta. I'll probably do that this evening. I tend to wait on upgrades until there has been enough time for bugs to surface unless I am already looking for a fix. I'm sure that the extra verification of the rulebase will help prevent the potential of problems, and I guess this has the possibility of being caused by a bit of corrupted data, though that's probably reaching. There were no substantive changes from the beta to the production version. Largely just a removal of monitoring code. Again, regardless if there was a blip, Sniffer still does a wonderful job of tagging lots and lots of E-mail, just not quite as much as the day before. Last night I was able to adjust the rule strength analysis window back to it's original settings. About 5 days of data were lost - but those days will be recovered quickly. Please let me know if this adjustment improved your conditions. I've noted that on a number of other lists there seem to be posts about a sudden increase in spam over the past few days. We are definitely seeing this also - approximately a 25% or more increase in new rule additions in the past 4 days: http://www.sortmonster.com/MessageSniffer/Performance/ChangeRates.jsp Specifically note from about 4 days ago... Days Ago Adjustments --- 0 356 1 508 2 391 3 410 4 410 5 326 6 309 7 371 8 292 9 347 10 309 ( 5-10 : 1954/6 -> 325.67, 0-5 : 2075/5 -> 415, 325.67/415 -> 78.47 ) Note that day 0 is not complete. So applying a "fudge factor" 78.4 _looks like_ 75%. Besides, 92% of statistics are made up on the spot anyway %^b I think a number of things are combined here... I just want to get a good handle on them and make sure we are doing the best we can. I've noted, Matt, that your rulebase tuning parameters are set at the defaults. If you would like to adjust these to be more aggressive then please let me know off list (support@). More aggressive settings will keep more rules active in yo
RE: [sniffer] Possible blip?
At 06:38 PM 5/20/2004, you wrote: Crew, I reposrted this speed issue before, but despite very intensive debugging and testing, we have not found an external cause (meaning: not sniffer) for the following: But, now comes the big mystery: when persistent mode is ON, it takes a lot more time to execute (while max polling is only 50ms!) 0,"2004-05-20 23:48:41",md5845373.msg,827,812,15,0,0,0,3607,1 0,"2004-05-20 23:48:52",md5845374.msg,842,812,0,0,0,0,3833,1 0,"2004-05-20 23:51:15",md5845375.msg,936,874,0,0,0,0,9560,1 0,"2004-05-20 23:51:35",md5845376.msg,889,859,15,0,0,0,26387,0 0,"2004-05-20 23:53:21",md5845377.msg,937,922,0,15,0,15,1922,0 Which averages at 850 ms! While I expected 45 + 25 ms (to compensate for average waiting time) = 70 ms! Pete, could you please check why this is happening (particularly in code OUTSIDE what's measured and logged)? I you can't find anything, I'll ask my collegue to come up with a timing program, which I would like to release on this list so other ppl can check how long it really takes to execute sniffer (measured from 'the outside'). As I recall when this last came up the solution turned out to be an on-access virus scanner that was introducing the extra delays. Turning off and/or adjusting the on-access virus scanner solved the timing problem. The theory goes that the MDaemon CF is single threaded so when Sniffer runs normally there will only be one instance at once, and as a result each instance loads it's own rulebase and scans it's own message... this results in two file reads and no write operations. With the persistent sniffer instance running as a server, there are several additional file creation, write, and access events per message. Each causes the on-demand scnner to intervene and thereby introduce the additional timing delays. The "transparent" way on-access virus scanners interfere with file operations accounts for the odd placement of the additional time. ... as I said, in theory ... Hope this helps, _M
RE: [sniffer] Possible blip?
Crew, I reposrted this speed issue before, but despite very intensive debugging and testing, we have not found an external cause (meaning: not sniffer) for the following: When I use sniffer without the persisten flag, I get this log: h0t861s4 20040520214718 md5845369.msg 125 16 Clean 0 0 0 2844 40h0t861s4 20040520214718 md5845370.msg 110 15 Clean 0 0 0 2747 36h0t861s4 20040520214804 md5845371.msg 109 16 Match 109406 62 43 93 43h0t861s4 20040520214804 md5845371.msg 109 16 Match 115560 58 2286 2307 43h0t861s4 20040520214804 md5845371.msg 109 16 Final 115560 58 0 3580 43h0t861s4 20040520214825 md5845372.msg 110 15 Match 29048 52 2757 2788 46h0t861s4 20040520214825 md5845372.msg 110 15 Match 122523 52 2930 2942 46h0t861s4 20040520214825 md5845372.msg 110 15 Match 122017 52 2968 2977 46h0t861s4 20040520214825 md5845372.msg 110 15 Match 122016 52 3346 3355 46h0t861s4 20040520214825 md5845372.msg 110 15 Final 29048 52 0 5504 46 which looks good (total execution time about 125ms) When I have a persistent version running (max 50 ms polling time), I get: h0t861s4 20040520214841 md5845373.msg 0 16 Clean 0 0 0 3597 53h0t861s4 20040520214852 md5845374.msg 16 31 Match 119377 62 684 741 38h0t861s4 20040520214852 md5845374.msg 16 31 Final 119377 62 0 3810 38h0t861s4 20040520215115 md5845375.msg 0 31 Match 29081 63 2413 2432 44h0t861s4 20040520215115 md5845375.msg 0 31 Final 29081 63 0 9458 44h0t861s4 20040520215134 md5845376.msg 0 94 Clean 0 0 0 24370 42h0t861s4 20040520215320 md5845377.msg 47 15 Clean 0 0 0 1945 35 Which are very good exec times (average 45 ms). We have created our own program that does lots of spam checking for messages. At some point, it fires Sniffer. We log the time it takes for Sniffer to run, for statistical purposes. When sniffer is NOT persistent, I get the following log snippet (same messages as 1st sniffer log above, the second number after the .msg is the time it takes for sniffer to run): 0,"2004-05-20 23:47:18",md5845369.msg,172,157,0,15,15,0,43406,20,"2004-05-20 23:47:18",md5845370.msg,172,156,16,0,0,0,43309,20,"2004-05-20 23:48:04",md5845371.msg,188,172,0,15,0,15,3578,10,"2004-05-20 23:48:25",md5845372.msg,186,156,14,0,0,0,5572,1 Average time to run sniffer is 160 ms (sniffer said 125 ms). That means, sniffer can't report about 35 ms which is normal for application startup and shutdown (also the log is written _after_ the exec time calculation has been made, file operations also take time). But, now comes the big mystery: when persistent mode is ON, it takes a lot more time to execute (while max polling is only 50ms!) 0,"2004-05-20 23:48:41",md5845373.msg,827,812,15,0,0,0,3607,10,"2004-05-20 23:48:52",md5845374.msg,842,812,0,0,0,0,3833,10,"2004-05-20 23:51:15",md5845375.msg,936,874,0,0,0,0,9560,10,"2004-05-20 23:51:35",md5845376.msg,889,859,15,0,0,0,26387,00,"2004-05-20 23:53:21",md5845377.msg,937,922,0,15,0,15,1922,0 Which averages at 850 ms! While I expected 45 + 25 ms (to compensate for average waiting time) = 70 ms! Pete, could you please check why this is happening (particularly in code OUTSIDE what's measured and logged)? I you can't find anything, I'll ask my collegue to come up with a timing program, which I would like to release on this list so other ppl can check how long it really takes to execute sniffer (measured from 'the outside'). Regards, ing. Michiel Prins SOS Small Office Solutions / REJECT Wannepad 27 1066 HW Amsterdam tel. 020-4082627 fax. 020-4082628 [EMAIL PROTECTED] Spamvrije zakelijke e-mail? reject.nl! Consultancy - Installation - Maintenance Network Security - Project Management Software Development - Internet - E-mail
Re: [sniffer] Possible blip?
At 05:00 PM 5/19/2004, you wrote: I haven't yet upgraded to the most recent release, I'm still on the prior beta. I'll probably do that this evening. I tend to wait on upgrades until there has been enough time for bugs to surface unless I am already looking for a fix. I'm sure that the extra verification of the rulebase will help prevent the potential of problems, and I guess this has the possibility of being caused by a bit of corrupted data, though that's probably reaching. There were no substantive changes from the beta to the production version. Largely just a removal of monitoring code. Again, regardless if there was a blip, Sniffer still does a wonderful job of tagging lots and lots of E-mail, just not quite as much as the day before. Last night I was able to adjust the rule strength analysis window back to it's original settings. About 5 days of data were lost - but those days will be recovered quickly. Please let me know if this adjustment improved your conditions. I've noted that on a number of other lists there seem to be posts about a sudden increase in spam over the past few days. We are definitely seeing this also - approximately a 25% or more increase in new rule additions in the past 4 days: http://www.sortmonster.com/MessageSniffer/Performance/ChangeRates.jsp Specifically note from about 4 days ago... Days Ago Adjustments --- 0 356 1 508 2 391 3 410 4 410 5 326 6 309 7 371 8 292 9 347 10 309 ( 5-10 : 1954/6 -> 325.67, 0-5 : 2075/5 -> 415, 325.67/415 -> 78.47 ) Note that day 0 is not complete. So applying a "fudge factor" 78.4 _looks like_ 75%. Besides, 92% of statistics are made up on the spot anyway %^b I think a number of things are combined here... I just want to get a good handle on them and make sure we are doing the best we can. I've noted, Matt, that your rulebase tuning parameters are set at the defaults. If you would like to adjust these to be more aggressive then please let me know off list (support@). More aggressive settings will keep more rules active in your rulebase at lower strengths and will also allow new rules more time to gain strength before being evaluated. Respectively the current defaults are: Minimum Rule Strength: 1.0 Grace Period: 5 days. Adjusting these settings can significantly increase the size of your rulebase file. Best, _M
Re: [sniffer] Possible blip?
Pete, I was judging based on the size of our Hold range which scores from 10-24. On Monday that was 1.86% of total traffic, but on Tuesday that was 2.83%. Message volume was hardly different. Other notables were that on Monday, Sniffer hit 77.27% of all E-mail but on Tuesday it hit 74.53% (both exclude Gray hits). Our overall spam percentage is about 82% on Monday and 81% on Tuesday. I did also see a drop in XBL hits which are primarily zombies from 38.14% to 34.93%. I've always found static spammers to be much more problematic because they lack many spammy patterns, and it could be that there was a wave of them that came online yesterday which could account for the difference. I don't want to make a huge deal out of this, but I noted the drop in size from one rulebase to another and thought that might be significant, and I like to be aware of what is going on. In reality though the difference in percentages in our Hold file meant manually reviewing 50% more E-mails, or about 500 extra messages. With everything else consistent, I figured it was worth a post just to check. I do recall an old posting where you indicated that you were going to drop the expiration down to 5 days under a certain number of hits. My thought there is that while it does present some savings in processing, it might make more sense to do a 7-8 day expiration in order to help catch spammers that are on weekly schedules, primarily lower volume niche spammers. Unfortunately I can't compare my current results accurately to the pre-change data because the makeup of my traffic has changed significantly over that time frame. Another possibility is that our Chinese language spam might have been extra heavy. I've brought in much more of that recently from a couple different clients and it regularly scores low, probably because it's difficult to determine if most of it is spam. I do know that Sniffer doesn't do nearly as well with this stuff. I've noticed that these guys are spamming mostly during Chinese business hours, and they might have been extra light on Monday due to the lag in hours coming from a weekend. If you are interested in getting these caught messages forwarded to you in an automated fashion for study or for potential inclusion, just let me know. I also have a filter set up for Russian language E-mail, but it is not nearly as high in volume (now). Regarding when I saw the changes in the rule base, I was pulling an all-nighter for server administration and noticed this around 5 a.m. when I ran the stats program on my Declude logs. The renamed 'old' rulebase was just over 4 MB while the active one was 4.7 MB, then at about noon I noticed it was about 4.3 MB, and now it's back up over 4.7 MB (1,000 KB = 1 MB in these stats if that matters). I haven't yet upgraded to the most recent release, I'm still on the prior beta. I'll probably do that this evening. I tend to wait on upgrades until there has been enough time for bugs to surface unless I am already looking for a fix. I'm sure that the extra verification of the rulebase will help prevent the potential of problems, and I guess this has the possibility of being caused by a bit of corrupted data, though that's probably reaching. Again, regardless if there was a blip, Sniffer still does a wonderful job of tagging lots and lots of E-mail, just not quite as much as the day before. Thanks, Matt Pete McNeil wrote: At 12:57 PM 5/19/2004, you wrote: Pete, I noted late last night that my rulebase grew by 700 KB over the size of the previous one that was archived on my machine, and also the hits for some of the tests were noticeably lower and I had a definite increase in the number of messages that scored in my Hold range (instead of scoring higher and landing in Drop). This morning though the size of my rulebase again dropped by about 450 KB. I was just wondering if this might have been a hiccup with a bad compilation or maybe you were testing something out? We didn't have anything under test that would alter the rulebases. I'm going to dig through the logs and see if there's anything I can identify. If the rulebase was corrupted in any way you would have been able to detect that with the latest snf2check utility. It's not unusual for ruelbase sizes to change by as much as 20%. The system is constantly activating and deactivating rules based on new log files that are reported. Currently a significant change might occur once per day - though we are working on new analysis engines that will permit more frequent rule strength adjustments. For example, we might add 300-900 rules over the course of a day - then have that many (or more) removed when the new rule strength numbers are calculated. Another factor that impacts rulebase size is the content of the rules. The folding process is not deterministic so it is possible for a few rule changes to significantly alter the way the rulebase file is folde
Re: [sniffer] Possible blip?
At 12:57 PM 5/19/2004, you wrote: Pete, I noted late last night that my rulebase grew by 700 KB over the size of the previous one that was archived on my machine, and also the hits for some of the tests were noticeably lower and I had a definite increase in the number of messages that scored in my Hold range (instead of scoring higher and landing in Drop). This morning though the size of my rulebase again dropped by about 450 KB. I was just wondering if this might have been a hiccup with a bad compilation or maybe you were testing something out? We didn't have anything under test that would alter the rulebases. I'm going to dig through the logs and see if there's anything I can identify. If the rulebase was corrupted in any way you would have been able to detect that with the latest snf2check utility. It's not unusual for ruelbase sizes to change by as much as 20%. The system is constantly activating and deactivating rules based on new log files that are reported. Currently a significant change might occur once per day - though we are working on new analysis engines that will permit more frequent rule strength adjustments. For example, we might add 300-900 rules over the course of a day - then have that many (or more) removed when the new rule strength numbers are calculated. Another factor that impacts rulebase size is the content of the rules. The folding process is not deterministic so it is possible for a few rule changes to significantly alter the way the rulebase file is folded. This is less likely to be the change but it is possible. What was the date on the archive you used to compare sizes? _M This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html
[sniffer] Possible blip?
Pete, I noted late last night that my rulebase grew by 700 KB over the size of the previous one that was archived on my machine, and also the hits for some of the tests were noticeably lower and I had a definite increase in the number of messages that scored in my Hold range (instead of scoring higher and landing in Drop). This morning though the size of my rulebase again dropped by about 450 KB. I was just wondering if this might have been a hiccup with a bad compilation or maybe you were testing something out? Thanks, Matt -- = MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ = This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html