Just committed a simple log suppressor for these kinds of spam..
Jun 7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" Jun 7 11:25:44.269 [1569] dbg: --- last message repeated 21 times --- On Fri, Jun 07, 2019 at 09:50:51AM +0300, Henrik K wrote: > > What does "unreadable for rule analysis" mean? Surely no one is actually > manually reading such lines one rule at a time? Computers can check and > grep for you.. ;-) > > I think this needs a little bit more of thought what we really want to > accomplish here and maybe do it in a bug along with the new templates and > stuff if needed.. > > > > On Thu, Jun 06, 2019 at 07:48:02AM -0400, Kevin A. McGrail wrote: > > That is a frightening one liner. Should we use it? > > > > As for the more output comment, if you have emails with 300 lower case e's, > > you > > get 300 hits for the subtext. It is unreadable for rule analysis. > > > > As for modifying the normal output, I have no idea if anyone out there is > > using > > the public routine so better to be safe. > > > > I didn't find a tag for subtests either. That might be a good 4.0 addition. > > > > Regards, KAM > > > > On Thu, Jun 6, 2019, 01:30 Henrik K <[1][email protected]> wrote: > > > > > > Well in theory you see _more_ debug output now when there are no > > duplicates, > > due to the stats string.. honestly atleast I wouldn't care about that. > > Feel free to vote. > > > > As a silly morning exercise, here's a one-liner that compacts stuff :-P > > > > my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E'; > > my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg; > > > > __A,__B,__C(3),__CC,__D(2),__E(2) > > > > > > On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote: > > > Good point, Henrik & John. > > > > > > OK, I've left the output alone except for the calls from dbg so it > > > shouldn't break anything in the public interface. > > > > > > Thoughts on this version? > > > > > > Regards, > > > KAM > > > > > > On 6/4/2019 1:51 PM, John Hardin wrote: > > > > On Tue, 4 Jun 2019, Kevin A. McGrail wrote: > > > > > > > >> Yes, I was thinking about that and wanting to fix uritests so well > > > >> for the > > > >> template. Thanks for the feedback. I will take another pass at > > it. > > > > > > > > Just do the deduplication without modifying the output format. > > > > > > > > If we want to log the hit counts, then make another function that > > does > > > > what you did and use it for logging. > > > > > > > > > > > >> On Tue, Jun 4, 2019, 03:23 Henrik K <[2][email protected]> wrote: > > > >> > > > >>> > > > >>> If you want to modify debug output, you have to modify only the > > dbg() > > > >>> output > > > >>> itself. You can't modify internal functions that have specific > > output > > > >>> formats and start adding random strings to them. Atleast these > > places > > > >>> depend on the comma delimited rules: > > > >>> > > > >>> ./masses/mass-check: push @tests, split(/,/, > > > >>> $status->get_names_of_subtests_hit()); > > > >>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 } > > > >>> split(/,/,$msg->get_names_of_tests_hit()), > > > >>> split(/,/,$msg->get_names_of_subtests_hit()); > > > >>> ./t.rules/run: my $testsline = > > > >>> $status->get_names_of_tests_hit().",".$status-> > > get_names_of_subtests_hit(); > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote: > > > >>>> Morning All, > > > >>>> > > > >>>> After a few thoughts on limits, it appears that any duplicate > > subtest > > > >>>> hits are best combined for debug output. > > > >>>> > > > >>>> Any thoughts on the attached? It looks like it will help me with > > rule > > > >>>> development while support rules with valid but large maxhits like > > > >>> __LOWER_E > > > >>>> > > > >>>> Regards, > > > >>>> KAM > > > >>>> > > > >>>> On 5/31/2019 10:30 AM, Bill Cole wrote: > > > >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote: > > > >>>>> > > > >>>>>> I was curious if anyone noticed the debug output for subtests > > has > > > >>> gotten > > > >>>>>> insane: > > > >>>>> > > > >>>>> It got a little discussion on users@ when I created those rules. > > > >>>>> > > > >>>>> [...] > > > >>>>> > > > >>>>>> [3]72_active.cf: body __LOWER_E /e/ > > > >>>>>> [4]72_active.cf: tflags __LOWER_E multiple > > > >>>>>> maxhits=230 > > > >>>>>> > > > >>>>>> [5]72_active.cf: body __E_LIKE_LETTER /<lcase_e>/ > > > >>>>>> [6]72_active.cf: tflags __E_LIKE_LETTER multiple > > > >>>>>> maxhits=320 > > > >>>>>> > > > >>>>>> Assuming those maxhits are correct, > > > >>>>> > > > >>>>> They are. In fact they were carefully tuned to catch the > > targeted > > > >>>>> extortion spam. > > > >>>>> > > > >>>>>> maybe we need something in the debug > > > >>>>>> output that says __E_LIKE_LETTER (number of hits if more than > > 1). > > > >>>>> > > > >>>>> That would be a useful enhancement even without my flagrant log > > > >>>>> vandalism. > > > >>>>> > > > >>>> > > > >>>> -- > > > >>>> Kevin A. McGrail > > > >>>> Member, Apache Software Foundation > > > >>>> Chair Emeritus Apache SpamAssassin Project > > > >>>> [7]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > >>>> > > > >>> > > > >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > > >>>> > > =================================================================== > > > >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision > > 1860582) > > > >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > > >>>> @@ -769,7 +769,38 @@ > > > >>>> sub get_names_of_subtests_hit { > > > >>>> my ($self) = @_; > > > >>>> > > > >>>> - return join(',', sort @{$self->{subtest_names_hit}}); > > > >>>> + #return join(',', sort @{$self->{subtest_names_hit}}); > > > >>>> + > > > >>>> + #This routine prints only one instance of a subrule hit with a > > > >>>> count > > > >>> of how many times it hit if greater than 1 > > > >>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, > > $rule, > > > >>> $total_hits, $deduplicated_hits); > > > >>>> + > > > >>>> + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > > >>>> + > > > >>>> + for ($i=0; $i < $total_hits; $i++) { > > > >>>> + $rule = ${$self->{subtest_names_hit}}[$i]; > > > >>>> + $subtest_names_hit{$rule}++; > > > >>>> + } > > > >>>> + > > > >>>> + foreach $key (keys %subtest_names_hit) { > > > >>>> + push (@keys, $key); > > > >>>> + } > > > >>>> + @sorted = sort @keys; > > > >>>> + > > > >>>> + $deduplicated_hits = scalar(@sorted); > > > >>>> + > > > >>>> + for ($i=0; $i < $deduplicated_hits; $i++) { > > > >>>> + $string .= $sorted[$i]; > > > >>>> + if ($subtest_names_hit{$sorted[$i]} > 1) { > > > >>>> + $string .= "($subtest_names_hit{$sorted[$i]})" > > > >>>> + } > > > >>>> + $string .= ","; > > > >>>> + } > > > >>>> + > > > >>>> + $string =~ s/,$//; > > > >>>> + > > > >>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > > Total > > > >>> Hits: $deduplicated_hits)"; > > > >>>> + > > > >>>> + return $string; > > > >>>> } > > > >>>> > > > >>>> > > > >>> > > ##################################################################### > > ###### > > > >>> > > > >>> > > > >>> > > > >> > > > > > > > > > > -- > > > Kevin A. McGrail > > > Member, Apache Software Foundation > > > Chair Emeritus Apache SpamAssassin Project > > > [8]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > > > > > > Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > > =================================================================== > > > --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > > +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > > @@ -398,7 +398,7 @@ > > > dbg("check: is spam? score=".$self->{score}. > > > " > > required=".$self->{conf}->{required_score}); > > > dbg("check: tests=".$self->get_names_of_tests_hit()); > > > - dbg("check: subtests=".$self->get_names_of_subtests_hit()); > > > + dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg")); > > > $self->{is_spam} = $self->is_spam(); > > > > > > $self->{main}->{resolver}->bgabort(); > > > @@ -764,12 +764,52 @@ > > > normally-hidden rules, which score 0 and have names beginning with > > two > > > underscores, used in meta rules. > > > > > > +If a parameter of dbg is passed, the output will be more condensed > > and > > > +sub-tests with multiple hits reduced to one entry with the number of > > hits > > > +in parentheses. Some information is also added at the end regarding > > the > > > +multiple hits. > > > + > > > =cut > > > > > > sub get_names_of_subtests_hit { > > > - my ($self) = @_; > > > + my ($self, $mode) = @_; > > > > > > - return join(',', sort @{$self->{subtest_names_hit}}); > > > + if (defined $mode && $mode eq 'dbg') { > > > + #This routine prints only one instance of a subrule hit with a > > count > > of how many times it hit if greater than 1 > > > + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, > > $total_hits, $deduplicated_hits); > > > + > > > + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > > + > > > + for ($i=0; $i < $total_hits; $i++) { > > > + $rule = ${$self->{subtest_names_hit}}[$i]; > > > + $subtest_names_hit{$rule}++; > > > + } > > > + > > > + foreach $key (keys %subtest_names_hit) { > > > + push (@keys, $key); > > > + } > > > + @sorted = sort @keys; > > > + > > > + $deduplicated_hits = scalar(@sorted); > > > + > > > + for ($i=0; $i < $deduplicated_hits; $i++) { > > > + $string .= $sorted[$i]; > > > + if ($subtest_names_hit{$sorted[$i]} > 1) { > > > + $string .= "($subtest_names_hit{$sorted[$i]})" > > > + } > > > + $string .= ","; > > > + } > > > + > > > + $string =~ s/,$//; > > > + > > > + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > > Total > > Hits: $deduplicated_hits)"; > > > + > > > + return $string; > > > + > > > + } else { > > > + #return the simpler string with duplicates and commas > > > + return join(',', sort @{$self->{subtest_names_hit}}); > > > + } > > > } > > > > > > > > ######################################################################## > > ### > > > > > > > > On Thu, Jun 6, 2019, 01:30 Henrik K <[9][email protected]> wrote: > > > > > > Well in theory you see _more_ debug output now when there are no > > duplicates, > > due to the stats string.. honestly atleast I wouldn't care about that. > > Feel free to vote. > > > > As a silly morning exercise, here's a one-liner that compacts stuff :-P > > > > my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E'; > > my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg; > > > > __A,__B,__C(3),__CC,__D(2),__E(2) > > > > > > On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote: > > > Good point, Henrik & John. > > > > > > OK, I've left the output alone except for the calls from dbg so it > > > shouldn't break anything in the public interface. > > > > > > Thoughts on this version? > > > > > > Regards, > > > KAM > > > > > > On 6/4/2019 1:51 PM, John Hardin wrote: > > > > On Tue, 4 Jun 2019, Kevin A. McGrail wrote: > > > > > > > >> Yes, I was thinking about that and wanting to fix uritests so well > > > >> for the > > > >> template. Thanks for the feedback. I will take another pass at > > it. > > > > > > > > Just do the deduplication without modifying the output format. > > > > > > > > If we want to log the hit counts, then make another function that > > does > > > > what you did and use it for logging. > > > > > > > > > > > >> On Tue, Jun 4, 2019, 03:23 Henrik K <[10][email protected]> wrote: > > > >> > > > >>> > > > >>> If you want to modify debug output, you have to modify only the > > dbg() > > > >>> output > > > >>> itself. You can't modify internal functions that have specific > > output > > > >>> formats and start adding random strings to them. Atleast these > > places > > > >>> depend on the comma delimited rules: > > > >>> > > > >>> ./masses/mass-check: push @tests, split(/,/, > > > >>> $status->get_names_of_subtests_hit()); > > > >>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 } > > > >>> split(/,/,$msg->get_names_of_tests_hit()), > > > >>> split(/,/,$msg->get_names_of_subtests_hit()); > > > >>> ./t.rules/run: my $testsline = > > > >>> $status->get_names_of_tests_hit().",".$status-> > > get_names_of_subtests_hit(); > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote: > > > >>>> Morning All, > > > >>>> > > > >>>> After a few thoughts on limits, it appears that any duplicate > > subtest > > > >>>> hits are best combined for debug output. > > > >>>> > > > >>>> Any thoughts on the attached? It looks like it will help me with > > rule > > > >>>> development while support rules with valid but large maxhits like > > > >>> __LOWER_E > > > >>>> > > > >>>> Regards, > > > >>>> KAM > > > >>>> > > > >>>> On 5/31/2019 10:30 AM, Bill Cole wrote: > > > >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote: > > > >>>>> > > > >>>>>> I was curious if anyone noticed the debug output for subtests > > has > > > >>> gotten > > > >>>>>> insane: > > > >>>>> > > > >>>>> It got a little discussion on users@ when I created those rules. > > > >>>>> > > > >>>>> [...] > > > >>>>> > > > >>>>>> [11]72_active.cf: body __LOWER_E /e/ > > > >>>>>> [12]72_active.cf: tflags __LOWER_E multiple > > > >>>>>> maxhits=230 > > > >>>>>> > > > >>>>>> [13]72_active.cf: body __E_LIKE_LETTER > > /<lcase_e>/ > > > >>>>>> [14]72_active.cf: tflags __E_LIKE_LETTER multiple > > > >>>>>> maxhits=320 > > > >>>>>> > > > >>>>>> Assuming those maxhits are correct, > > > >>>>> > > > >>>>> They are. In fact they were carefully tuned to catch the > > targeted > > > >>>>> extortion spam. > > > >>>>> > > > >>>>>> maybe we need something in the debug > > > >>>>>> output that says __E_LIKE_LETTER (number of hits if more than > > 1). > > > >>>>> > > > >>>>> That would be a useful enhancement even without my flagrant log > > > >>>>> vandalism. > > > >>>>> > > > >>>> > > > >>>> -- > > > >>>> Kevin A. McGrail > > > >>>> Member, Apache Software Foundation > > > >>>> Chair Emeritus Apache SpamAssassin Project > > > >>>> [15]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > >>>> > > > >>> > > > >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > > >>>> > > =================================================================== > > > >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision > > 1860582) > > > >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > > >>>> @@ -769,7 +769,38 @@ > > > >>>> sub get_names_of_subtests_hit { > > > >>>> my ($self) = @_; > > > >>>> > > > >>>> - return join(',', sort @{$self->{subtest_names_hit}}); > > > >>>> + #return join(',', sort @{$self->{subtest_names_hit}}); > > > >>>> + > > > >>>> + #This routine prints only one instance of a subrule hit with a > > > >>>> count > > > >>> of how many times it hit if greater than 1 > > > >>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, > > $rule, > > > >>> $total_hits, $deduplicated_hits); > > > >>>> + > > > >>>> + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > > >>>> + > > > >>>> + for ($i=0; $i < $total_hits; $i++) { > > > >>>> + $rule = ${$self->{subtest_names_hit}}[$i]; > > > >>>> + $subtest_names_hit{$rule}++; > > > >>>> + } > > > >>>> + > > > >>>> + foreach $key (keys %subtest_names_hit) { > > > >>>> + push (@keys, $key); > > > >>>> + } > > > >>>> + @sorted = sort @keys; > > > >>>> + > > > >>>> + $deduplicated_hits = scalar(@sorted); > > > >>>> + > > > >>>> + for ($i=0; $i < $deduplicated_hits; $i++) { > > > >>>> + $string .= $sorted[$i]; > > > >>>> + if ($subtest_names_hit{$sorted[$i]} > 1) { > > > >>>> + $string .= "($subtest_names_hit{$sorted[$i]})" > > > >>>> + } > > > >>>> + $string .= ","; > > > >>>> + } > > > >>>> + > > > >>>> + $string =~ s/,$//; > > > >>>> + > > > >>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > > Total > > > >>> Hits: $deduplicated_hits)"; > > > >>>> + > > > >>>> + return $string; > > > >>>> } > > > >>>> > > > >>>> > > > >>> > > ##################################################################### > > ###### > > > >>> > > > >>> > > > >>> > > > >> > > > > > > > > > > -- > > > Kevin A. McGrail > > > Member, Apache Software Foundation > > > Chair Emeritus Apache SpamAssassin Project > > > [16]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > > > > > > Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > > =================================================================== > > > --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > > +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > > @@ -398,7 +398,7 @@ > > > dbg("check: is spam? score=".$self->{score}. > > > " > > required=".$self->{conf}->{required_score}); > > > dbg("check: tests=".$self->get_names_of_tests_hit()); > > > - dbg("check: subtests=".$self->get_names_of_subtests_hit()); > > > + dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg")); > > > $self->{is_spam} = $self->is_spam(); > > > > > > $self->{main}->{resolver}->bgabort(); > > > @@ -764,12 +764,52 @@ > > > normally-hidden rules, which score 0 and have names beginning with > > two > > > underscores, used in meta rules. > > > > > > +If a parameter of dbg is passed, the output will be more condensed > > and > > > +sub-tests with multiple hits reduced to one entry with the number of > > hits > > > +in parentheses. Some information is also added at the end regarding > > the > > > +multiple hits. > > > + > > > =cut > > > > > > sub get_names_of_subtests_hit { > > > - my ($self) = @_; > > > + my ($self, $mode) = @_; > > > > > > - return join(',', sort @{$self->{subtest_names_hit}}); > > > + if (defined $mode && $mode eq 'dbg') { > > > + #This routine prints only one instance of a subrule hit with a > > count > > of how many times it hit if greater than 1 > > > + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, > > $total_hits, $deduplicated_hits); > > > + > > > + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > > + > > > + for ($i=0; $i < $total_hits; $i++) { > > > + $rule = ${$self->{subtest_names_hit}}[$i]; > > > + $subtest_names_hit{$rule}++; > > > + } > > > + > > > + foreach $key (keys %subtest_names_hit) { > > > + push (@keys, $key); > > > + } > > > + @sorted = sort @keys; > > > + > > > + $deduplicated_hits = scalar(@sorted); > > > + > > > + for ($i=0; $i < $deduplicated_hits; $i++) { > > > + $string .= $sorted[$i]; > > > + if ($subtest_names_hit{$sorted[$i]} > 1) { > > > + $string .= "($subtest_names_hit{$sorted[$i]})" > > > + } > > > + $string .= ","; > > > + } > > > + > > > + $string =~ s/,$//; > > > + > > > + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > > Total > > Hits: $deduplicated_hits)"; > > > + > > > + return $string; > > > + > > > + } else { > > > + #return the simpler string with duplicates and commas > > > + return join(',', sort @{$self->{subtest_names_hit}}); > > > + } > > > } > > > > > > > > ######################################################################## > > ### > > > > > > > > References: > > > > [1] mailto:[email protected] > > [2] mailto:[email protected] > > [3] http://72_active.cf/ > > [4] http://72_active.cf/ > > [5] http://72_active.cf/ > > [6] http://72_active.cf/ > > [7] https://www.linkedin.com/in/kmcgrail > > [8] https://www.linkedin.com/in/kmcgrail > > [9] mailto:[email protected] > > [10] mailto:[email protected] > > [11] http://72_active.cf/ > > [12] http://72_active.cf/ > > [13] http://72_active.cf/ > > [14] http://72_active.cf/ > > [15] https://www.linkedin.com/in/kmcgrail > > [16] https://www.linkedin.com/in/kmcgrail
