Good point, Henrik & John.
OK, I've left the output alone except for the calls from dbg so it
shouldn't break anything in the public interface.
Thoughts on this version?
Regards,
KAM
On 6/4/2019 1:51 PM, John Hardin wrote:
> On Tue, 4 Jun 2019, Kevin A. McGrail wrote:
>
>> Yes, I was thinking about that and wanting to fix uritests so well
>> for the
>> template. Thanks for the feedback. I will take another pass at it.
>
> Just do the deduplication without modifying the output format.
>
> If we want to log the hit counts, then make another function that does
> what you did and use it for logging.
>
>
>> On Tue, Jun 4, 2019, 03:23 Henrik K <[email protected]> wrote:
>>
>>>
>>> If you want to modify debug output, you have to modify only the dbg()
>>> output
>>> itself. You can't modify internal functions that have specific output
>>> formats and start adding random strings to them. Atleast these places
>>> depend on the comma delimited rules:
>>>
>>> ./masses/mass-check: push @tests, split(/,/,
>>> $status->get_names_of_subtests_hit());
>>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 }
>>> split(/,/,$msg->get_names_of_tests_hit()),
>>> split(/,/,$msg->get_names_of_subtests_hit());
>>> ./t.rules/run: my $testsline =
>>> $status->get_names_of_tests_hit().",".$status->get_names_of_subtests_hit();
>>>
>>>
>>>
>>>
>>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote:
>>>> Morning All,
>>>>
>>>> After a few thoughts on limits, it appears that any duplicate subtest
>>>> hits are best combined for debug output.
>>>>
>>>> Any thoughts on the attached? It looks like it will help me with rule
>>>> development while support rules with valid but large maxhits like
>>> __LOWER_E
>>>>
>>>> Regards,
>>>> KAM
>>>>
>>>> On 5/31/2019 10:30 AM, Bill Cole wrote:
>>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote:
>>>>>
>>>>>> I was curious if anyone noticed the debug output for subtests has
>>> gotten
>>>>>> insane:
>>>>>
>>>>> It got a little discussion on users@ when I created those rules.
>>>>>
>>>>> [...]
>>>>>
>>>>>> 72_active.cf: body __LOWER_E /e/
>>>>>> 72_active.cf: tflags __LOWER_E multiple
>>>>>> maxhits=230
>>>>>>
>>>>>> 72_active.cf: body __E_LIKE_LETTER /<lcase_e>/
>>>>>> 72_active.cf: tflags __E_LIKE_LETTER multiple
>>>>>> maxhits=320
>>>>>>
>>>>>> Assuming those maxhits are correct,
>>>>>
>>>>> They are. In fact they were carefully tuned to catch the targeted
>>>>> extortion spam.
>>>>>
>>>>>> maybe we need something in the debug
>>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1).
>>>>>
>>>>> That would be a useful enhancement even without my flagrant log
>>>>> vandalism.
>>>>>
>>>>
>>>> --
>>>> Kevin A. McGrail
>>>> Member, Apache Software Foundation
>>>> Chair Emeritus Apache SpamAssassin Project
>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>
>>>
>>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
>>>> ===================================================================
>>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582)
>>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy)
>>>> @@ -769,7 +769,38 @@
>>>> sub get_names_of_subtests_hit {
>>>> my ($self) = @_;
>>>>
>>>> - return join(',', sort @{$self->{subtest_names_hit}});
>>>> + #return join(',', sort @{$self->{subtest_names_hit}});
>>>> +
>>>> + #This routine prints only one instance of a subrule hit with a
>>>> count
>>> of how many times it hit if greater than 1
>>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
>>> $total_hits, $deduplicated_hits);
>>>> +
>>>> + $total_hits = scalar(@{$self->{subtest_names_hit}});
>>>> +
>>>> + for ($i=0; $i < $total_hits; $i++) {
>>>> + $rule = ${$self->{subtest_names_hit}}[$i];
>>>> + $subtest_names_hit{$rule}++;
>>>> + }
>>>> +
>>>> + foreach $key (keys %subtest_names_hit) {
>>>> + push (@keys, $key);
>>>> + }
>>>> + @sorted = sort @keys;
>>>> +
>>>> + $deduplicated_hits = scalar(@sorted);
>>>> +
>>>> + for ($i=0; $i < $deduplicated_hits; $i++) {
>>>> + $string .= $sorted[$i];
>>>> + if ($subtest_names_hit{$sorted[$i]} > 1) {
>>>> + $string .= "($subtest_names_hit{$sorted[$i]})"
>>>> + }
>>>> + $string .= ",";
>>>> + }
>>>> +
>>>> + $string =~ s/,$//;
>>>> +
>>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total
>>> Hits: $deduplicated_hits)";
>>>> +
>>>> + return $string;
>>>> }
>>>>
>>>>
>>> ###########################################################################
>>>
>>>
>>>
>>
>
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171
Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
===================================================================
--- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582)
+++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy)
@@ -398,7 +398,7 @@
dbg("check: is spam? score=".$self->{score}.
" required=".$self->{conf}->{required_score});
dbg("check: tests=".$self->get_names_of_tests_hit());
- dbg("check: subtests=".$self->get_names_of_subtests_hit());
+ dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg"));
$self->{is_spam} = $self->is_spam();
$self->{main}->{resolver}->bgabort();
@@ -764,12 +764,52 @@
normally-hidden rules, which score 0 and have names beginning with two
underscores, used in meta rules.
+If a parameter of dbg is passed, the output will be more condensed and
+sub-tests with multiple hits reduced to one entry with the number of hits
+in parentheses. Some information is also added at the end regarding the
+multiple hits.
+
=cut
sub get_names_of_subtests_hit {
- my ($self) = @_;
+ my ($self, $mode) = @_;
- return join(',', sort @{$self->{subtest_names_hit}});
+ if (defined $mode && $mode eq 'dbg') {
+ #This routine prints only one instance of a subrule hit with a count of
how many times it hit if greater than 1
+ my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
$total_hits, $deduplicated_hits);
+
+ $total_hits = scalar(@{$self->{subtest_names_hit}});
+
+ for ($i=0; $i < $total_hits; $i++) {
+ $rule = ${$self->{subtest_names_hit}}[$i];
+ $subtest_names_hit{$rule}++;
+ }
+
+ foreach $key (keys %subtest_names_hit) {
+ push (@keys, $key);
+ }
+ @sorted = sort @keys;
+
+ $deduplicated_hits = scalar(@sorted);
+
+ for ($i=0; $i < $deduplicated_hits; $i++) {
+ $string .= $sorted[$i];
+ if ($subtest_names_hit{$sorted[$i]} > 1) {
+ $string .= "($subtest_names_hit{$sorted[$i]})"
+ }
+ $string .= ",";
+ }
+
+ $string =~ s/,$//;
+
+ $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total Hits:
$deduplicated_hits)";
+
+ return $string;
+
+ } else {
+ #return the simpler string with duplicates and commas
+ return join(',', sort @{$self->{subtest_names_hit}});
+ }
}
###########################################################################