On 07/14/2013 01:00 PM, Mark Martinec wrote:
On 2013-07-13 15:41, Axb wrote:
my weekly masdcheck which just ran a while ago spit a huge list of

Malformed UTF-8 character (unexpected non-continuation byte 0x6e,
immediately after start byte 0xf6) in transliteration (tr///) at
/data/masscheckwork/weekly_mass_check/masses/../lib/Mail/SpamAssassin/DnsResolver.pm

line 627.

Sounds similar to
   [Bug 6945] sa-learn dies on non-ASCII characters in Message-ID

I'd be interested in a sample message producing this.

It is open to discussion whether URLs should be treated as
plain bytes when submitted to DNS queries and the like,
or should these be subject to character decoding according
to MIME content charset. So the fix is either at the DNS
resolver stage (encoding characters back to bytes), or (correctly
IMO) at the text parsing stage, where decoding to characters
can still be prevented.


also way too many like:

dns: new_dns_packet (domain=g-ecx..images-amazon.com. type=A class=IN)
failed: a domain name contains a null label

dns: new_dns_packet (domain=www.pearl..de. type=A class=IN) failed: a
domain name contains a null label

dns: new_dns_packet (domain=new..itunes.com. type=A class=IN) failed: a
domain name contains a null label

where domains have two periods as in images-amazon..com

I checked a few of these msgs and none of the domains had borked URIs

seems something is not parsing correctly

Anybody else? Marc? any idea where I should start looking?

The cases with double dots (= empty label fields) which I have
investigated, all turned out to be the issue of incorrectly encoding
URLs by the sender, i.e. not a SpamAssassin decoding bug. If you see
other cases, I'd again be interested in a sample.

If it is the sender's mistake, the warning can be demoted to a
debug message if it is too intrusive, or perhaps empty label fields
can be removed by the URL decoder in SpamAssassin, following a logic
of "do what I mean, not what I say".

(I'm not subscribed to ruleqa ML, sorry)

in the meantime I wiped the only ham corpus which produced these errors and have started collecting new stuff.
Will watch for any new samples and report.

Thanks Mark


Reply via email to