https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6152





--- Comment #8 from Steve Freegard <[email protected]>  2009-07-10 
15:17:23 PST ---
(In reply to comment #7)
> Unless negative look-ahead assertions do have a significant performance 
> impact,
> we could even do it the other way round and actually define what we consider 
> to
> be a sane offset. Like this.
> 
>   /[-+](?!(?:0\d|1[0-4])(?:[03]0|[14]5))\d{4}$/
> 
> This looks out for a four digit offset, that does not match the sane offsets
> defined in the leading (?! ) part. Probably better comprehensible, apart from
> the reversed logic. ;)
> 
> This one is my proposal.
>

Thanks for all the feedback.  The version above is much easier to read.  My
only comment would be to remove the $ anchor as the offset isn't always at the
end of the date header - consider these examples that I just pulled out from my
spamtrap mailbox:

Date: Tue, 12 May 2009 07:30:19 -0700 (PDT)
Date: Thu, 14 May 2009 18:48:45 +0000 (GMT+00:00)

Although I guess you could handle these cases by add changing the end of the
regexp:

\d{4}(?:\s\(\S+\))?$/

> Being slightly more anal, cutting off at +1400, not allowing 14-odd fractions
> either, would just bloat the RE and isn't worth it IMHO. Same for
> differentiating further between positive and negative possible offsets.

Yeah; definitely agree - your proposed regexp is good enough and considerably
better than the current rule without adding unnecessary bloat.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to