Re: character set / encoding problem?
In an older episode (Sunday 01 May 2005 02:07), Loren Wilton wrote: > > Again and again, we receive messages that contain stuff like > > http://advinc-ma=2enetfirms=2ecom/";> > > instead of > > http://advinc-ma.netfirms.com/";> > > > > That prevents uri / body rules like e.g. > > no/yes > > > /netfirms\.com/ > > and URIBL rules from being triggered. I wonder if there is some "function" > to > > automatically "de-code" such items instead of having to use stuff like > > /netfirms(?:\.|=2e)com/ and how i could use it with SA. > > URI rules should be hitting already; certainly on 3.0. indeed, URI rules hit, thanks for the hint. > Body rules on 3.0 > may be failing. But then, I'm not sure that 3.0 will have the uri in the > body text. Rawbody and full rules will certainly fail. After all, that is > the whole reason the spammers do that extra extraneous encoding. > > But then, it is nice that they put that extra encoding in the uris. Makes > it easy to add points for useless uri encoding. :-) > > Loren
Re: character set / encoding problem?
In an older episode (Saturday 30 April 2005 21:41), David B Funk wrote: > In the meantime, I've coded local rules that explicitly target this bogus > encoding as a spam sign: > > body L_BOGUS_QP1/\b=2e(?:com|biz|info|net|org|us)[:\/]\b/ > describe L_BOGUS_QP1Bogus QuotedPrintable encoding > score L_BOGUS_QP1 1.1 > > meta L_BOGUS_QP2(L_BOGUS_QP1 && HTML_MESSAGE) > describe L_BOGUS_QP2HTML message that uses Bogus QP > score L_BOGUS_QP2 1.5 they don't work for me with the message I enclosed earlier. why "\b=2e" by the way? regards, wolfgang
Re: character set / encoding problem?
On Sat, Apr 30, 2005 at 02:41:57PM -0500, David B Funk wrote: > We've already gone 'round this issue in past discussions on this list, the > DEVs reply was, maybe 'fixed' in future releases. Ok, fair enough. Then FYI: 3.1 handles the lowercase version. :) -- Randomly Generated Tagline: "I was up all night trying to round off infinity." - Bob Lazarus pgpNtuhbch7WZ.pgp Description: PGP signature
Re: character set / encoding problem?
On Sat, 30 Apr 2005, Theo Van Dinter wrote: > On Sat, Apr 30, 2005 at 01:27:39PM +0200, wolfgang wrote: > > Again and again, we receive messages that contain stuff like > > http://advinc-ma=2enetfirms=2ecom/";> > > instead of > > http://advinc-ma.netfirms.com/";> > > > > I wonder if there is some "function" to > > automatically "de-code" such items instead of having to use stuff like > > /netfirms(?:\.|=2e)com/ and how i could use it with SA. > > "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc... > SA handles "proper" encoding (it handles a lot of non-proper encoding > as well), but doesn't make guesses if the MIME part says there is no > encoding. No, '=3d' is BOGUS, it is not RFC compliant quoted-printable encoding. The MIME RFC states clearly that the hex characters MUST be CAPS (EG '=3D' is valid QP, '=3d' is not). SA does not handle the bogus form altho many mail clients do. We've already gone 'round this issue in past discussions on this list, the DEVs reply was, maybe 'fixed' in future releases. In the meantime, I've coded local rules that explicitly target this bogus encoding as a spam sign: body L_BOGUS_QP1/\b=2e(?:com|biz|info|net|org|us)[:\/]\b/ describe L_BOGUS_QP1Bogus QuotedPrintable encoding score L_BOGUS_QP1 1.1 meta L_BOGUS_QP2(L_BOGUS_QP1 && HTML_MESSAGE) describe L_BOGUS_QP2HTML message that uses Bogus QP score L_BOGUS_QP2 1.5 -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: character set / encoding problem?
wolfgang wrote: > In an older episode (Saturday 30 April 2005 14:45), Theo Van Dinter > wrote: >> "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc... >> SA handles "proper" encoding (it handles a lot of non-proper encoding >> as well), but doesn't make guesses if the MIME part says there is no >> encoding. I remember a discussion a while back about this, =2e is invalid while =2E is valid. But then I searched and found this: Rule #1: (General 8-bit representation) Any octet, except those indicating a line break according to the newline convention of the canonical (standard) form of the data being encoded, may be represented by an "=" followed by a two digit hexadecimal representation of the octet's value. The digits of the hexadecimal alphabet, for this purpose, are "0123456789ABCDEF". Uppercase letters must be used when sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt. Thus, for example, the value 12 (ASCII form feed) can be represented by "=0C", and the value 61 (ASCII EQUAL SIGN) can be represented by "=3D". Except when the following rules allow an alternative encoding, this rule is mandatory. IT's this line: "Uppercase letters must be used when sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt."
Re: character set / encoding problem?
In an older episode (Saturday 30 April 2005 14:45), Theo Van Dinter wrote: > "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc... > SA handles "proper" encoding (it handles a lot of non-proper encoding > as well), but doesn't make guesses if the MIME part says there is no > encoding. > > Without samples of the message, it's hard to comment on why something does or > does not work. the message headers say Content-Type: text/html Content-Transfer-Encoding: quoted-printable I enclose the message for reference, local user data obfuscated with "xxx". regards, wolfgang --- Begin Message --- --- Begin Message --- Advertising International Company More than 7 years of successful working in the market of advertising services Looking for a perspective and well-paid job in the Germany? 1. If you are looking for a perspective and well-paid work in the Germany (of more than 500 eur. in a week guaranteed). 2. If you are a resident of the Germany (Prefer). 3. If you are over 18 (Obligatory condition). - Flexible hours - No financial risk, you do not need pay money for start work! Check up our site for the further information Click here (once) --- End Message --- --- End Message ---
Re: character set / encoding problem?
On Sat, Apr 30, 2005 at 01:27:39PM +0200, wolfgang wrote: > Again and again, we receive messages that contain stuff like > http://advinc-ma=2enetfirms=2ecom/";> > instead of > http://advinc-ma.netfirms.com/";> > > I wonder if there is some "function" to > automatically "de-code" such items instead of having to use stuff like > /netfirms(?:\.|=2e)com/ and how i could use it with SA. "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc... SA handles "proper" encoding (it handles a lot of non-proper encoding as well), but doesn't make guesses if the MIME part says there is no encoding. Without samples of the message, it's hard to comment on why something does or does not work. -- Randomly Generated Tagline: "M: Can anyone tell us the lesson that has been learned here? S: Yes Master, not a single one of us could defeat you. M: You gain wisdom child ... "- The Frantics pgpaYTKb8bJzy.pgp Description: PGP signature