Well in theory you see _more_ debug output now when there are no duplicates,
due to the stats string.. honestly atleast I wouldn't care about that.
Feel free to vote.
As a silly morning exercise, here's a one-liner that compacts stuff :-P
my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E';
my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg;
__A,__B,__C(3),__CC,__D(2),__E(2)
On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote:
> Good point, Henrik & John.
>
> OK, I've left the output alone except for the calls from dbg so it
> shouldn't break anything in the public interface.
>
> Thoughts on this version?
>
> Regards,
> KAM
>
> On 6/4/2019 1:51 PM, John Hardin wrote:
> > On Tue, 4 Jun 2019, Kevin A. McGrail wrote:
> >
> >> Yes, I was thinking about that and wanting to fix uritests so well
> >> for the
> >> template. Thanks for the feedback. I will take another pass at it.
> >
> > Just do the deduplication without modifying the output format.
> >
> > If we want to log the hit counts, then make another function that does
> > what you did and use it for logging.
> >
> >
> >> On Tue, Jun 4, 2019, 03:23 Henrik K <[email protected]> wrote:
> >>
> >>>
> >>> If you want to modify debug output, you have to modify only the dbg()
> >>> output
> >>> itself. You can't modify internal functions that have specific output
> >>> formats and start adding random strings to them. Atleast these places
> >>> depend on the comma delimited rules:
> >>>
> >>> ./masses/mass-check: push @tests, split(/,/,
> >>> $status->get_names_of_subtests_hit());
> >>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 }
> >>> split(/,/,$msg->get_names_of_tests_hit()),
> >>> split(/,/,$msg->get_names_of_subtests_hit());
> >>> ./t.rules/run: my $testsline =
> >>> $status->get_names_of_tests_hit().",".$status->get_names_of_subtests_hit();
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote:
> >>>> Morning All,
> >>>>
> >>>> After a few thoughts on limits, it appears that any duplicate subtest
> >>>> hits are best combined for debug output.
> >>>>
> >>>> Any thoughts on the attached? It looks like it will help me with rule
> >>>> development while support rules with valid but large maxhits like
> >>> __LOWER_E
> >>>>
> >>>> Regards,
> >>>> KAM
> >>>>
> >>>> On 5/31/2019 10:30 AM, Bill Cole wrote:
> >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote:
> >>>>>
> >>>>>> I was curious if anyone noticed the debug output for subtests has
> >>> gotten
> >>>>>> insane:
> >>>>>
> >>>>> It got a little discussion on users@ when I created those rules.
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>> 72_active.cf: body __LOWER_E /e/
> >>>>>> 72_active.cf: tflags __LOWER_E multiple
> >>>>>> maxhits=230
> >>>>>>
> >>>>>> 72_active.cf: body __E_LIKE_LETTER /<lcase_e>/
> >>>>>> 72_active.cf: tflags __E_LIKE_LETTER multiple
> >>>>>> maxhits=320
> >>>>>>
> >>>>>> Assuming those maxhits are correct,
> >>>>>
> >>>>> They are. In fact they were carefully tuned to catch the targeted
> >>>>> extortion spam.
> >>>>>
> >>>>>> maybe we need something in the debug
> >>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1).
> >>>>>
> >>>>> That would be a useful enhancement even without my flagrant log
> >>>>> vandalism.
> >>>>>
> >>>>
> >>>> --
> >>>> Kevin A. McGrail
> >>>> Member, Apache Software Foundation
> >>>> Chair Emeritus Apache SpamAssassin Project
> >>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> >>>>
> >>>
> >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
> >>>> ===================================================================
> >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582)
> >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy)
> >>>> @@ -769,7 +769,38 @@
> >>>> sub get_names_of_subtests_hit {
> >>>> my ($self) = @_;
> >>>>
> >>>> - return join(',', sort @{$self->{subtest_names_hit}});
> >>>> + #return join(',', sort @{$self->{subtest_names_hit}});
> >>>> +
> >>>> + #This routine prints only one instance of a subrule hit with a
> >>>> count
> >>> of how many times it hit if greater than 1
> >>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
> >>> $total_hits, $deduplicated_hits);
> >>>> +
> >>>> + $total_hits = scalar(@{$self->{subtest_names_hit}});
> >>>> +
> >>>> + for ($i=0; $i < $total_hits; $i++) {
> >>>> + $rule = ${$self->{subtest_names_hit}}[$i];
> >>>> + $subtest_names_hit{$rule}++;
> >>>> + }
> >>>> +
> >>>> + foreach $key (keys %subtest_names_hit) {
> >>>> + push (@keys, $key);
> >>>> + }
> >>>> + @sorted = sort @keys;
> >>>> +
> >>>> + $deduplicated_hits = scalar(@sorted);
> >>>> +
> >>>> + for ($i=0; $i < $deduplicated_hits; $i++) {
> >>>> + $string .= $sorted[$i];
> >>>> + if ($subtest_names_hit{$sorted[$i]} > 1) {
> >>>> + $string .= "($subtest_names_hit{$sorted[$i]})"
> >>>> + }
> >>>> + $string .= ",";
> >>>> + }
> >>>> +
> >>>> + $string =~ s/,$//;
> >>>> +
> >>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total
> >>> Hits: $deduplicated_hits)";
> >>>> +
> >>>> + return $string;
> >>>> }
> >>>>
> >>>>
> >>> ###########################################################################
> >>>
> >>>
> >>>
> >>
> >
>
> --
> Kevin A. McGrail
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
> ===================================================================
> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582)
> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy)
> @@ -398,7 +398,7 @@
> dbg("check: is spam? score=".$self->{score}.
> " required=".$self->{conf}->{required_score});
> dbg("check: tests=".$self->get_names_of_tests_hit());
> - dbg("check: subtests=".$self->get_names_of_subtests_hit());
> + dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg"));
> $self->{is_spam} = $self->is_spam();
>
> $self->{main}->{resolver}->bgabort();
> @@ -764,12 +764,52 @@
> normally-hidden rules, which score 0 and have names beginning with two
> underscores, used in meta rules.
>
> +If a parameter of dbg is passed, the output will be more condensed and
> +sub-tests with multiple hits reduced to one entry with the number of hits
> +in parentheses. Some information is also added at the end regarding the
> +multiple hits.
> +
> =cut
>
> sub get_names_of_subtests_hit {
> - my ($self) = @_;
> + my ($self, $mode) = @_;
>
> - return join(',', sort @{$self->{subtest_names_hit}});
> + if (defined $mode && $mode eq 'dbg') {
> + #This routine prints only one instance of a subrule hit with a count of
> how many times it hit if greater than 1
> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
> $total_hits, $deduplicated_hits);
> +
> + $total_hits = scalar(@{$self->{subtest_names_hit}});
> +
> + for ($i=0; $i < $total_hits; $i++) {
> + $rule = ${$self->{subtest_names_hit}}[$i];
> + $subtest_names_hit{$rule}++;
> + }
> +
> + foreach $key (keys %subtest_names_hit) {
> + push (@keys, $key);
> + }
> + @sorted = sort @keys;
> +
> + $deduplicated_hits = scalar(@sorted);
> +
> + for ($i=0; $i < $deduplicated_hits; $i++) {
> + $string .= $sorted[$i];
> + if ($subtest_names_hit{$sorted[$i]} > 1) {
> + $string .= "($subtest_names_hit{$sorted[$i]})"
> + }
> + $string .= ",";
> + }
> +
> + $string =~ s/,$//;
> +
> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total Hits:
> $deduplicated_hits)";
> +
> + return $string;
> +
> + } else {
> + #return the simpler string with duplicates and commas
> + return join(',', sort @{$self->{subtest_names_hit}});
> + }
> }
>
> ###########################################################################