https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8268
Bug ID: 8268
Summary: trim whitespace from anchor text in uri_detail_list
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Hardware: PC
OS: Linux
Status: NEW
Severity: minor
Priority: P2
Component: Libraries
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: Undefined
It would be convenient if leading & trailing whitespace was removed from
anchor_text in uri_detail_list. For example, HTML such as:
<a href="#">
Download File
</a>
will end up with anchor_text containing "\n Download File\n". This leads to
unexpected results if you have a rule such as:
uri-detail RULENAME text =~ /^download file$/i
The workaround is to not use regex anchors, or explicitly allow whitespace in
the regex:
uri-detail RULENAME text =~ /^\s*download file/i
However, I think this is non-intuitive and has tripped me up several times. I
don't think there is any harm in removing the whitespace since the rules of
HTML whitespace dictate that the HTML above should parse identically to this
HTML:
<a href="#">Download File</a>
Please see the attached patch and provide feedback.
--
You are receiving this mail because:
You are the assignee for the bug.