> Hi all,
>
Hello,
> I'm afraid I found kinda bug in the code which handles the detection of
> "spam-...@domain", to trigger retraining to false positives/negatives.
>
> Remember I had been struggling around this for days? Dspam was never
> considering my forwards to spam-cyri...@xxx, just doing nothing, thus
> never "learning", thus never finding spam...
>
> I found that, in agent_shared.c :
> function : int process_parseto(AGENT_CTX *ATX, const char *buf)
> -------------------------
> (...)
> if (!buf)
> return EINVAL;
> h = strstr(buf, "\r\n\r\n");
> if (!h) h = strstr(buf, "\n\n"); [1]
> x = strstr(buf, "<spam-");
> (...)
> if (x > h) x = NULL [2]
> ---------------------------
>
> For this last line [2], I understand that the goal is to ignore any
> "spam-" or similar, if it is "after" a double CRLF ("empty line"), which
> is probably intended to encode the headers/body boundary.
>
Wrong. The reason for checking this double CRLF or double LF is because of
the format of the TO header. According to the standard a mail header can
be (for example the To header):
To: [email protected]
or
To: [email protected], [email protected]
or
To: [email protected],
[email protected]
and so on...
The code checking for "\r\n\r\n" or "\n\n" is there to prevent checking
inside the mail body. But it is not there to encode header/body boundary.
IMHO the code should be even extended and prevent checking for
spam/notspam in any other header than "To".
> I am not sure whether this is useful, as this function is called with a
> 1-line text buffer (fetched line by line in the caller). But that's just
> for safety, no big deal.
>
You miss the point that this function can be used by other applications
that use libdspam and no one is preventing the coder using libdspam to
call the function with: "To: [email protected]\nfrom:
[email protected]\nsubject: [email protected] testing\n\nJust a test\n".
So it is more then just safety.
> But, due to line [1] above, if the line does not contain any
> double-CRLF, h will be NULL when checking x > h (on line [2]) as this is
> not protected. x > NULL will quite often be true, thus x will be reset
> to NULL, and we will never conclude a spam-xxx was found.
>
> I changed line [2] to
> if (h && x > h) x = NULL;
>
> And that works.
>
Yeah. That works. Should however be extended to prevent the code to match
[email protected] in anything other then the "To" header.
> I am not sure in which context this could work before, but I guess this
> patch cannot hurt, even if one day, buf contains empty lines.
>
> Hope that helps, it did for me ;-) Tell me if I am wrong on stg...
>
I am going to overwork that function when I return home from work.
> Regards,
> Cyril'
>
Stevan
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Dspam-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
>
>
>
>
------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user