Re: Benchmarking SA

2016-07-15 Thread RW
On Fri, 15 Jul 2016 12:00:59 -0400
Olivier Coutu wrote:

> On 2016-07-15 10:22, Reindl Harald wrote:
> >
> >
> > Am 15.07.2016 um 16:06 schrieb Olivier Coutu:  
> >> I am trying to figure out what part of SA is taking the most time
> >> on certain e-mails, e.g
> >>
> >> time spamassassin ham-1468528393442166.eml
> >>
> >> [...]
> >>
> >> real0m34.531s
> >>
> >> user0m33.958s
> >>
> >> sys0m0.452s
> >>
> >>

> Some DNS lookups indeed seem to take a certain amount of time, but
> the first 4-second hop cannot be explained that way. Is there a way
> that I could limit the number of DNS lookups being made, say to max
> 50?

Take a look at the parameters to rbl_timeout.

However, the timings above suggest that this has nothing to do with
network delays. If it were then "user"+"sys" would be a few percent of
"real".

IIWY I'd be more worried about why it's using so much cpu time.


Re: Benchmarking SA

2016-07-15 Thread Reindl Harald



Am 15.07.2016 um 18:00 schrieb Olivier Coutu:

On 2016-07-15 10:22, Reindl Harald wrote:

just use spamassassin -D which gives you some timing informations,
most likely DNS - DNSBL/DNSWL/URIBL


Some DNS lookups indeed seem to take a certain amount of time, but the
first 4-second hop cannot be explained that way. Is there a way that I
could limit the number of DNS lookups being made, say to max 50?


you have a timing report at the end - just look closer

it makes *no sense* to limit dns lookups - which ones would you skip - 
slower ones?


what when the slowest ones are the RBL/URIBL hits which would have been 
the most important of the whole scan


when would you decide what is fast and what is slow - they are all fired 
asynchron at the same time - so it's not possible

_

at the end before the sa-report you have something like this

Jul 15 18:08:46.165 [6210] dbg: check: 
tests=BAYES_99,BAYES_999,DATE_IN_PAST_96_XX,HTML_IMAGE_ONLY_32,HTML_MESSAGE,MISSING_HEADERS
Jul 15 18:08:46.165 [6210] dbg: check: 
subtests=__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__CLICK_HERE,__CR_IN_SUBJ,__CT,__CTE,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__CTYPE_MULTIPART_ANY,__DKIM_DEPENDABLE,__DOS_HAS_ANY_URI,__DOS_LINK,__DOS_RCVD_MON,__ENV_AND_HDR_FROM_MATCH,__FB_COST,__FB_NUM_PERCNT,__FROM_FULL_NAME,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_URI,__HUSH_HUSH,__JM_REACTOR_DATE,__LCL__ENV_AND_HDR_FROM_MATCH,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__NONEMPTY_BODY,__NOT_A_PERSON,__NOT_SPOOFED,__RATWARE_0_TZ_DATE,__SANE_MSGID,__SUBSCRIPTION_INFO,__TO_NO_ARROWS_R,__TO_NO_BRKTS_PCNT,__TVD_MIME_ATT_TP,__UNPARSEABLE_RELAY_COUNT,__UNSUB_LINK,__URI_MAILTO
Jul 15 18:08:46.166 [6210] dbg: timing: total 2373 ms - init: 754 
(31.8%), parse: 0.61 (0.0%), extract_message_metadata: 39 (1.6%), 
get_uri_detail_list: 3.7 (0.2%), tests_pri_-1000: 7 (0.3%), compile_gen: 
107 (4.5%), compile_eval: 16 (0.7%), tests_pri_-950: 3.2 (0.1%), 
tests_pri_-900: 6 (0.3%), tests_pri_-500: 28 (1.2%), dkim_load_modules: 
15 (0.6%), check_dkim_signature: 0.32 (0.0%), tests_pri_-400: 17 (0.7%), 
check_bayes: 13 (0.6%), b_tokenize: 7 (0.3%), b_tok_get_all: 2.5 (0.1%), 
b_comp_prob: 2.4 (0.1%), b_tok_touch_all: 0.13 (0.0%), b_finish: 0.81 
(0.0%), tests_pri_0: 1173 (49.4%), check_spf: 21 (0.9%), 
check_dkim_adsp: 87 (3.7%), check_razor2: 552 (23.2%), check_pyzor: 139 
(5.9%), tests_pri_500: 341 (14.4%), poll_dns_idle: 296 (12.5%)





signature.asc
Description: OpenPGP digital signature


Re: Benchmarking SA

2016-07-15 Thread Olivier Coutu


On 2016-07-15 10:22, Reindl Harald wrote:



Am 15.07.2016 um 16:06 schrieb Olivier Coutu:

I am trying to figure out what part of SA is taking the most time on
certain e-mails, e.g

time spamassassin ham-1468528393442166.eml

[...]

real0m34.531s

user0m33.958s

sys0m0.452s


I have installed HitFreqsRuleTiming and the timing result file has
specific rules that takes long, but the total is about 21 seconds:

awk '{print $3}' timing.log | awk '{s+=$1}END{print s}'

21.4295

I was wondering if there was a way to find out what the extra 13 seconds
were used for. Also, would there be a better benchmarking strategy?

I have attempted to look at the debug, but the timing seems to skip
multiple seconds at random intervals that are not indicative of what
rule ran


just use spamassassin -D which gives you some timing informations, 
most likely DNS - DNSBL/DNSWL/URIBL



I have looked at -D:
spamassassin -D 2>&1 < ham-1468528393442166.eml

I am still getting multi-second hops in the debug outputs:
[...]
jui 15 11:47:28.527 [29298] dbg: rules: Z_HAMMY_SPF (pri -60) requires 
__ENV_FROM_4SQUARE (pri 0): fixed
jui 15 11:47:34.402 [29298] dbg: rules: Z_HAMMY_SPF (pri -60) requires 
__ENV_FROM_DIGG (pri 0): fixed

[...]
jui 15 11:47:36.723 [29298] dbg: dns: dns reply 5194 is OK, 1 answer records
jui 15 11:47:46.829 [29298] dbg: async: calling callback on key 
A:ns1.dnsmadeeasy.com

[...]
jui 15 11:47:46.853 [29298] dbg: async: calling callback on key 
DNSBL:2.126.80.208:zen.spamhaus.org, rule URIBL_SBL
jui 15 11:52:07.569 [29298] dbg: uridnsbl: complete_dnsbl_lookup 
URIBL_SBL DNSBL:2.126.80.208:zen.spamhaus.org

[...]
jui 15 11:52:07.594 [29298] dbg: rules: flush_evalstr 
(run_generic_tests) compiling 469 chars of 
Mail::SpamAssassin::Plugin::Check::_uri_tests_neg950_1
jui 15 11:52:19.274 [29298] dbg: rules: run_generic_tests - compiling 
eval code: uri, priority -950

[...]

Some DNS lookups indeed seem to take a certain amount of time, but the 
first 4-second hop cannot be explained that way. Is there a way that I 
could limit the number of DNS lookups being made, say to max 50?


Re: Benchmarking SA

2016-07-15 Thread Reindl Harald



Am 15.07.2016 um 16:06 schrieb Olivier Coutu:

I am trying to figure out what part of SA is taking the most time on
certain e-mails, e.g

time spamassassin ham-1468528393442166.eml

[...]

real0m34.531s

user0m33.958s

sys0m0.452s


I have installed HitFreqsRuleTiming and the timing result file has
specific rules that takes long, but the total is about 21 seconds:

awk '{print $3}' timing.log | awk '{s+=$1}END{print s}'

21.4295

I was wondering if there was a way to find out what the extra 13 seconds
were used for. Also, would there be a better benchmarking strategy?

I have attempted to look at the debug, but the timing seems to skip
multiple seconds at random intervals that are not indicative of what
rule ran


just use spamassassin -D which gives you some timing informations, most 
likely DNS - DNSBL/DNSWL/URIBL




signature.asc
Description: OpenPGP digital signature


Benchmarking SA

2016-07-15 Thread Olivier Coutu
I am trying to figure out what part of SA is taking the most time on 
certain e-mails, e.g


time spamassassin ham-1468528393442166.eml

[...]

real0m34.531s

user0m33.958s

sys0m0.452s


I have installed HitFreqsRuleTiming and the timing result file has 
specific rules that takes long, but the total is about 21 seconds:


awk '{print $3}' timing.log | awk '{s+=$1}END{print s}'

21.4295


I was wondering if there was a way to find out what the extra 13 seconds 
were used for. Also, would there be a better benchmarking strategy?


I have attempted to look at the debug, but the timing seems to skip 
multiple seconds at random intervals that are not indicative of what 
rule ran.


--
Olivier Coutu
Assistance technique
Technical Support
T : 514-527-3232 x 2
n...@zerospam.ca