Well in theory you see _more_ debug output now when there are no duplicates,
due to the stats string..  honestly atleast I wouldn't care about that. 
Feel free to vote.

As a silly morning exercise, here's a one-liner that compacts stuff :-P

my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E';
my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg;

__A,__B,__C(3),__CC,__D(2),__E(2)


On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote:
> Good point, Henrik & John.
> 
> OK, I've left the output alone except for the calls from dbg so it
> shouldn't break anything in the public interface.
> 
> Thoughts on this version?
> 
> Regards,
> KAM
> 
> On 6/4/2019 1:51 PM, John Hardin wrote:
> > On Tue, 4 Jun 2019, Kevin A. McGrail wrote:
> >
> >> Yes, I was thinking about that and wanting to fix uritests so well
> >> for the
> >> template.   Thanks for the feedback.  I will take another pass at it.
> >
> > Just do the deduplication without modifying the output format.
> >
> > If we want to log the hit counts, then make another function that does
> > what you did and use it for logging.
> >
> >
> >> On Tue, Jun 4, 2019, 03:23 Henrik K <[email protected]> wrote:
> >>
> >>>
> >>> If you want to modify debug output, you have to modify only the dbg()
> >>> output
> >>> itself.  You can't modify internal functions that have specific output
> >>> formats and start adding random strings to them.  Atleast these places
> >>> depend on the comma delimited rules:
> >>>
> >>> ./masses/mass-check:    push @tests, split(/,/,
> >>> $status->get_names_of_subtests_hit());
> >>> ./t/rule_tests.t:    my %rules_hit = map { $_ => 1 }
> >>> split(/,/,$msg->get_names_of_tests_hit()),
> >>> split(/,/,$msg->get_names_of_subtests_hit());
> >>> ./t.rules/run:  my $testsline =
> >>> $status->get_names_of_tests_hit().",".$status->get_names_of_subtests_hit();
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote:
> >>>> Morning All,
> >>>>
> >>>> After a few thoughts on limits, it appears that any duplicate subtest
> >>>> hits are best combined for debug output.
> >>>>
> >>>> Any thoughts on the attached?  It looks like it will help me with rule
> >>>> development while support rules with valid but large maxhits like
> >>> __LOWER_E
> >>>>
> >>>> Regards,
> >>>> KAM
> >>>>
> >>>> On 5/31/2019 10:30 AM, Bill Cole wrote:
> >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote:
> >>>>>
> >>>>>> I was curious if anyone noticed the debug output for subtests has
> >>> gotten
> >>>>>> insane:
> >>>>>
> >>>>> It got a little discussion on users@ when I created those rules.
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>> 72_active.cf:    body            __LOWER_E       /e/
> >>>>>> 72_active.cf:    tflags          __LOWER_E       multiple
> >>>>>> maxhits=230
> >>>>>>
> >>>>>> 72_active.cf:    body            __E_LIKE_LETTER /<lcase_e>/
> >>>>>> 72_active.cf:    tflags          __E_LIKE_LETTER multiple
> >>>>>> maxhits=320
> >>>>>>
> >>>>>> Assuming those maxhits are correct,
> >>>>>
> >>>>> They are. In fact they were carefully tuned to catch the targeted
> >>>>> extortion spam.
> >>>>>
> >>>>>> maybe we need something in the debug
> >>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1).
> >>>>>
> >>>>> That would be a useful enhancement even without my flagrant log
> >>>>> vandalism.
> >>>>>
> >>>>
> >>>> -- 
> >>>> Kevin A. McGrail
> >>>> Member, Apache Software Foundation
> >>>> Chair Emeritus Apache SpamAssassin Project
> >>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> >>>>
> >>>
> >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
> >>>> ===================================================================
> >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm       (revision 1860582)
> >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm       (working copy)
> >>>> @@ -769,7 +769,38 @@
> >>>>  sub get_names_of_subtests_hit {
> >>>>    my ($self) = @_;
> >>>>
> >>>> -  return join(',', sort @{$self->{subtest_names_hit}});
> >>>> +  #return join(',', sort @{$self->{subtest_names_hit}});
> >>>> +
> >>>> +  #This routine prints only one instance of a subrule hit with a
> >>>> count
> >>> of how many times it hit if greater than 1
> >>>> +  my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
> >>> $total_hits, $deduplicated_hits);
> >>>> +
> >>>> +  $total_hits = scalar(@{$self->{subtest_names_hit}});
> >>>> +
> >>>> +  for ($i=0; $i < $total_hits; $i++) {
> >>>> +    $rule = ${$self->{subtest_names_hit}}[$i];
> >>>> +    $subtest_names_hit{$rule}++;
> >>>> +  }
> >>>> +
> >>>> +  foreach $key (keys %subtest_names_hit) {
> >>>> +    push (@keys, $key);
> >>>> +  }
> >>>> +  @sorted = sort @keys;
> >>>> +
> >>>> +  $deduplicated_hits = scalar(@sorted);
> >>>> +
> >>>> +  for ($i=0; $i < $deduplicated_hits; $i++) {
> >>>> +    $string .= $sorted[$i];
> >>>> +    if ($subtest_names_hit{$sorted[$i]} > 1) {
> >>>> +      $string .= "($subtest_names_hit{$sorted[$i]})"
> >>>> +    }
> >>>> +    $string .= ",";
> >>>> +  }
> >>>> +
> >>>> +  $string =~ s/,$//;
> >>>> +
> >>>> +  $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total
> >>> Hits: $deduplicated_hits)";
> >>>> +
> >>>> +  return $string;
> >>>>  }
> >>>>
> >>>>
> >>> ###########################################################################
> >>>
> >>>
> >>>
> >>
> >
> 
> -- 
> Kevin A. McGrail
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> 

> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
> ===================================================================
> --- lib/Mail/SpamAssassin/PerMsgStatus.pm       (revision 1860582)
> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm       (working copy)
> @@ -398,7 +398,7 @@
>    dbg("check: is spam? score=".$self->{score}.
>                          " required=".$self->{conf}->{required_score});
>    dbg("check: tests=".$self->get_names_of_tests_hit());
> -  dbg("check: subtests=".$self->get_names_of_subtests_hit());
> +  dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg"));
>    $self->{is_spam} = $self->is_spam();
>  
>    $self->{main}->{resolver}->bgabort();
> @@ -764,12 +764,52 @@
>  normally-hidden rules, which score 0 and have names beginning with two
>  underscores, used in meta rules.
>  
> +If a parameter of dbg is passed, the output will be more condensed and 
> +sub-tests with multiple hits reduced to one entry with the number of hits 
> +in parentheses. Some information is also added at the end regarding the 
> +multiple hits.
> +
>  =cut
>  
>  sub get_names_of_subtests_hit {
> -  my ($self) = @_;
> +  my ($self, $mode) = @_;
>  
> -  return join(',', sort @{$self->{subtest_names_hit}});
> +  if (defined $mode && $mode eq 'dbg') {
> +    #This routine prints only one instance of a subrule hit with a count of 
> how many times it hit if greater than 1
> +    my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, 
> $total_hits, $deduplicated_hits);  
> +  
> +    $total_hits = scalar(@{$self->{subtest_names_hit}});
> +  
> +    for ($i=0; $i < $total_hits; $i++) {
> +      $rule = ${$self->{subtest_names_hit}}[$i]; 
> +      $subtest_names_hit{$rule}++; 
> +    }
> +  
> +    foreach $key (keys %subtest_names_hit) {
> +      push (@keys, $key);
> +    }
> +    @sorted = sort @keys;
> +  
> +    $deduplicated_hits = scalar(@sorted);
> +  
> +    for ($i=0; $i < $deduplicated_hits; $i++) {
> +      $string .= $sorted[$i];
> +      if ($subtest_names_hit{$sorted[$i]} > 1) {
> +        $string .= "($subtest_names_hit{$sorted[$i]})"
> +      }
> +      $string .= ",";
> +    }
> +  
> +    $string =~ s/,$//;
> +  
> +    $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total Hits: 
> $deduplicated_hits)";
> +  
> +    return $string;
> +
> +  } else {
> +    #return the simpler string with duplicates and commas
> +    return join(',', sort @{$self->{subtest_names_hit}});
> +  }
>  }
>  
>  ###########################################################################

Reply via email to