Robert Menschel writes:
> Justin, could you repeat a mass-check and that analysis on this rule,
> which I'm willing to sacrifice for the sake of science? Not
> necessarily now, but a month or two from now?
> 
> header    SARE_SUBJ_MED_USE        Subject =~ /\w{3}\sused .+ (?:along 
> with|combin|manage|prevent|relieve|symptom|treat)/i
> describe  SARE_SUBJ_MED_USE        Spam topic found in subject
> score     SARE_SUBJ_MED_USE        1.666
> #hist     SARE_SUBJ_MED_USE        Bob Menschel, May 14 2005
> #counts   SARE_SUBJ_MED_USE        208s/0h of 297244 corpus (135824s/161420h 
> RM) 06/12/05
> #max      SARE_SUBJ_MED_USE        253s/0h of 275081 corpus (134226s/140855h 
> RM) 05/30/05
> #counts   SARE_SUBJ_MED_USE        2s/0h of 5648 corpus (1019s/4629h ft) 
> 06/04/05
> #counts   SARE_SUBJ_MED_USE        0s/0h of 55803 corpus (18630s/37173h 
> JH-3.01) 06/10/05
> #counts   SARE_SUBJ_MED_USE        108s/0h of 49034 corpus (44877s/4157h MY) 
> 06/11/05
> #counts   SARE_SUBJ_MED_USE        1s/0h of 11269 corpus (6578s/4691h CT) 
> 06/11/05
> 
> This rule was developed just over two months ago, flagging the emails
> whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
> to each other. It's not a big hitter, but it's reliable and effective
> in its own way.
> 
> I don't expect too much of a drop in spam matching this rule in the
> two months since it appeared in our SARE rule set. It was a fairly
> quiet addition.
> 
> But now that aware spammers reading this list know that if they tie a
> drug name to its symptom, or to its function, or to its companion
> drugs, we will catch it (and yes, we do add new patterns to this rule
> as we find them), I expect most of those will find alternate ways to
> word their subjects to avoid this category of pattern, and by the end
> of August the hit rates on this rule should decrease significantly.
> 
> I would be interested in seeing if that expectation matches reality.

Just to follow up on this (as requested).... it looks like the answer
is "no, but not in the way you were thinking".

unfortunately the test didn't display useful data -- on one hand, the hit
rate of the rule was too low in the first place, but on the other, the
rule had already stopped hitting spam 3 months before that message was
posted.  The spammers had moved on already.

My corpus got a *maximum* of 8 messages hitting this rule in one week (the
week of Apr 4). The last hit I saw on the rule was in a mail received on
Apr 23 22:44:56 2003.  Not a single hit after that date... I think that's
an inconclusive test.

Could it be they noticed the rule appearing in the ruleset, regardless
of its discussion or lack thereof?

I've attached the data anyway, if anyone wants a look -- first column is
the time value (UNIX format), second hits on the rule, third spam mails
received in total in that period.

--j.

1067923071 0 6
1068527871 0 11
1069132671 0 446
1069737471 0 454
1070342271 0 638
1070947071 0 593
1071551871 0 395
1072156671 0 0
1072761471 0 1501
1073366271 0 758
1073971071 0 773
1074575871 0 605
1075180671 0 676
1075785471 0 492
1076390271 0 405
1076995071 0 384
1077599871 0 465
1078204671 0 445
1078809471 0 600
1079414271 0 919
1080019071 0 1100
1080623871 0 1032
1081228671 0 1315
1081833471 0 1421
1082438271 0 1334
1083043071 0 1440
1083647871 0 2419
1084252671 0 3308
1084857471 0 3658
1085462271 0 3382
1086067071 0 3644
1086671871 0 3842
1087276671 0 4127
1087881471 0 3955
1088486271 0 3731
1089091071 0 3239
1089695871 0 3419
1090300671 0 3743
1090905471 0 3589
1091510271 0 3975
1092115071 0 4087
1092719871 0 4119
1093324671 1 4031
1093929471 0 3830
1094534271 0 4416
1095139071 0 4354
1095743871 0 4358
1096348671 0 4184
1096953471 0 4214
1097558271 0 4080
1098163071 0 4328
1098767871 0 4081
1099372671 0 4455
1099977471 0 3810
1100582271 0 4570
1101187071 0 4494
1101791871 0 4794
1102396671 0 4689
1103001471 2 4843
1103606271 0 4635
1104211071 0 4547
1104815871 0 5118
1105420671 0 5888
1106025471 0 6176
1106630271 2 6324
1107235071 3 5906
1107839871 2 5413
1108444671 5 5597
1109049471 2 4867
1109654271 3 4799
1110259071 1 4727
1110863871 7 5048
1111468671 3 4806
1112073471 3 4455
1112678271 8 4244
1113283071 6 4531
1113887871 5 4718
1114492671 0 4639
1115097471 0 4662
1115702271 0 10215
1116307071 0 18921
1116911871 0 5388
1117516671 0 5727
1118121471 0 6033
1118726271 0 5844
1119331071 0 5936
1119935871 0 6246
1120540671 0 6802
1121145471 0 6046
1121750271 0 4260
1122355071 0 2928
1122959871 0 5105
1123564671 0 5428
1124169471 0 5562
1124774271 0 5323
1125379071 0 5885
1125983871 0 5654
1126588671 0 6661
1127193471 0 6951
1127798271 0 7376

Reply via email to