Re: [Dspam-user] retrain via dspam-xxx@

stevan Wed, 12 May 2010 01:47:30 -0700

> Hi all,
>
Hello,


> I'm afraid I found kinda bug in the code which handles the detection of
> "spam-...@domain", to trigger retraining to false positives/negatives.
>
> Remember I had been struggling around this for days? Dspam was never
> considering my forwards to spam-cyri...@xxx, just doing nothing, thus
> never "learning", thus never finding spam...
>
> I found that, in agent_shared.c :
> function : int process_parseto(AGENT_CTX *ATX, const char *buf)
> -------------------------
>   (...)
>   if (!buf)
>     return EINVAL;
>   h = strstr(buf, "\r\n\r\n");
>   if (!h) h = strstr(buf, "\n\n");        [1]
>   x = strstr(buf, "<spam-");
>  (...)
>   if (x > h) x = NULL                  [2]
> ---------------------------
>
> For this last line [2], I understand that the goal is to ignore any
> "spam-" or similar, if it is "after" a double CRLF ("empty line"), which
> is probably intended to encode the headers/body boundary.
>
Wrong. The reason for checking this double CRLF or double LF is because of
the format of the TO header. According to the standard a mail header can
be (for example the To header):
To: [email protected]

or

To: [email protected], [email protected]

or

To: [email protected],
        [email protected]

and so on...

The code checking for "\r\n\r\n" or "\n\n" is there to prevent checking
inside the mail body. But it is not there to encode header/body boundary.
IMHO the code should be even extended and prevent checking for
spam/notspam in any other header than "To".


> I am not sure whether this is useful, as this function is called with a
> 1-line text buffer (fetched line by line in the caller). But that's just
> for safety, no big deal.
>
You miss the point that this function can be used by other applications
that use libdspam and no one is preventing the coder using libdspam to
call the function with: "To: [email protected]\nfrom:
[email protected]\nsubject: [email protected] testing\n\nJust a test\n".
So it is more then just safety.


> But, due to line [1] above, if the line does not contain any
> double-CRLF, h will be NULL when checking x > h (on line [2]) as this is
> not protected. x > NULL will quite often be true, thus x will be reset
> to NULL, and we will never conclude a spam-xxx was found.
>
> I changed line [2] to
>   if (h &&  x > h) x = NULL;
>
> And that works.
>
Yeah. That works. Should however be extended to prevent the code to match
[email protected] in anything other then the "To" header.


> I am not sure in which context this could work before, but I guess this
> patch cannot hurt, even if one day, buf contains empty lines.
>
> Hope that helps, it did for me ;-) Tell me if I am wrong on stg...
>
I am going to overwork that function when I return home from work.


> Regards,
> Cyril'
>
Stevan

> ------------------------------------------------------------------------------
>
> _______________________________________________
> Dspam-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
>
>
>
>



------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] retrain via dspam-xxx@

Reply via email to