Re: spoofing mail
On 29.11.18 09:30, Rupert Gallagher wrote: Message-ID and To have the same domain, but From does not. You should have never received that mail. this happens when message-id is added by mailserver of the recipient. Should hit MSGID_FROM_MTA_HEADER. And, yes, there could be rule that catches message-id added by internal server. Note that: - Message-ID is not required (has SHOULD in RFC) - many mailservers add message-id if it doesn't exist. On Wed, Nov 28, 2018 at 19:15, Rick Gutierrez wrote: El mié., 28 nov. 2018 a las 6:03, Christian Grunfeld () escribió: Hi, this is a logcould you paste the email headers? cheers I do not know if it is useful, the amavisd + spamassassin I have it in front of the mail server. https://pastebin.com/ktMUDLps not available anymore :-( -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95
Re: Bayes underperforming, HTML entities?
On Thu, 29 Nov 2018 22:33:12 -0700 Amir Caspi wrote: > On Nov 29, 2018, at 10:11 PM, Bill Cole > wrote: > > > > I have no issue with adding a new rule type to act on the output of > > a partial well-defined HTML parsing, something in between 'rawbody' > > and 'body' types, but overloading normalize_charset with that and > > so affecting every existing rule of all body-oriented rule types > > would be a bad design. > > The problem as I see it is that spammers are using HTML encoding as > effectively another charset, and as a way of obfuscating like they > did/do with Unicode lookalikes... but unless those HTML characters > are translated there is no way to catch this obfuscation. normalize_charset is about converting text from whatever character set it's in to UTF-8, and nothing else. SpamAssassin should already decode HTML to text for body rules. Rules matching the HTML entities use rawbody specifically to avoid having them converted to plain text. The most substantial problem here is that these invisible characters make it very hard to write ordinary body rules.
Re: Bayes underperforming, HTML entities?
On Nov 30, 2018, at 6:09 AM, RW wrote: > > The most substantial problem here is that these invisible characters > make it very hard to write ordinary body rules. Thanks for the clarification on my confusion. Since HTML is already getting rendered to text, then perhaps the conversion code should strip (literally, just delete) any zero-width characters during this conversion? That should make normal body rules, and Bayes, function properly, no? Is there a reason not to strip out zero-width characters? That is, is there any benefit or reason to maintain invisible chars versus throwing them out? Thanks! --- Amir
Re: --virtual-config-dir=pattern is not substituted
On 29 Nov 2018, at 8:06, Eggert Ehmke wrote: Strange, I am missing that configuration in /etc/postfix/master.cf. Will add them. Please be careful. It is *possible* to have SpamAssassin hooked into the mail acceptance and delivery flow in many different ways, I only (vaguely) described the most common way that I've seen for doing so with the "spamd" daemon involved, which is indicated by your use of '--virtual-config-dir' and a log file named 'spamd.log.' If your configuration was completely missing any relevant entry in master.cf, it could be that SA was being used via Dovecot or via an access map FILTER result. Make sure you understand how your plumbing works before changing it. Am Donnerstag, 29. November 2018, 01:15:39 CET schrieb Bill Cole: On 28 Nov 2018, at 17:53, Eggert Ehmke wrote: Do you mean the --username option in /etc/default/spamassassin? No. Postfix is running the 'spamc' program in some fashion, usually via a pipe transport configured in master.cf. That transport (typically an intermediary script) needs to be passed the recipient address by Postfix and may need to transform it in some fashion (e.g. strip the domain maybe) to use it as the argument to the '-u' option in an invocation of spamc. It is set to the generic user --username=debian-spamd Thank you Am Mittwoch, 28. November 2018, 22:41:38 CET schrieb RW: On Tue, 27 Nov 2018 18:01:04 +0100 Eggert Ehmke wrote: I have Spamassassin running on Debian with Postfix, Dovecot etc. It seems to work, Spam is filtered to my Quarantine. I have some virtual mailboxes in /var/mail/vhosts and have set up the Option -x --virtual-config-dir=/var/mail/vhosts/%d/%l/spamassassin This does not work, in the log file /var/log/spamassassin/spamd.log I find these lines: warn: plugin: eval failed: bayes: (in learn) locker: safe_lock: cannot create tmp lockfile /var/ mail/vhosts///spamassassin/bayes.lock.domain.de.3653 for /var/mail/vhosts///spa So the user name and the domain are not replaced in the pattern. What may be wrong?? Are you sure the recipient address is being passed to spamc via the -u option? -- Bill Cole
Re: Bayes underperforming, HTML entities?
On 30 Nov 2018, at 8:29, Amir Caspi wrote: On Nov 30, 2018, at 6:09 AM, RW wrote: The most substantial problem here is that these invisible characters make it very hard to write ordinary body rules. Thanks for the clarification on my confusion. Since HTML is already getting rendered to text, then perhaps the conversion code should strip (literally, just delete) any zero-width characters during this conversion? That should make normal body rules, and Bayes, function properly, no? Not if they are *looking for* those characters. Is there a reason not to strip out zero-width characters? That is, is there any benefit or reason to maintain invisible chars versus throwing them out? The presence of zero-width characters is a very strong spam indicator. It isn't quite perfect however, since at least one procedurally legitimate and rather popular US entity is sending mail that people affirmatively want to receive like this: https://www.scconsult.com/atkspam.txt -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: Bayes underperforming, HTML entities?
On Fri, 30 Nov 2018 06:29:31 -0700 Amir Caspi wrote: > On Nov 30, 2018, at 6:09 AM, RW wrote: > > > > The most substantial problem here is that these invisible characters > > make it very hard to write ordinary body rules. > > Thanks for the clarification on my confusion. Since HTML is already > getting rendered to text, then perhaps the conversion code should > strip (literally, just delete) any zero-width characters during this > conversion? That should make normal body rules, and Bayes, function > properly, no? > > Is there a reason not to strip out zero-width characters? That is, is > there any benefit or reason to maintain invisible chars versus > throwing them out? It make it harder to write rules detecting these tricks, but it may happen eventually. As far as Bayes is concerned, it would be a shame to lose the information. What I think might be a good compromise is to normalize out all invisible and high quality obfuscations, but add the original and normalized words to two metadata headers. So, if represent a homoglyph for 'a' and is an invisible character, then the text my mlware has copied your address book would be converted to my malware has copied your address book with the generation of X-Obfuscated-Orig: mlware has address X-Obfuscated-Norm: malware has address It would be possible to run headers rules against either pseudo header. Bayes would ignore X-Obfuscated-Orig and tokenize X-Obfuscated-Norm with a dedicated prefix. Most common English works from that header would be strongly spammy.
Re: spoofing mail
El vie., 30 nov. 2018 a las 3:06, Matus UHLAR - fantomas () escribió: > And, yes, there could be rule that catches message-id added by internal > server. Note that: > - Message-ID is not required (has SHOULD in RFC) > - many mailservers add message-id if it doesn't exist. > > >> > >> https://pastebin.com/ktMUDLps > > not available anymore :-( > -- Hi , here it is https://pastebin.com/3TtsjXSX last trace , after my gateway analyzes it https://pastebin.com/76rNVnnp -- rickygm http://gnuforever.homelinux.com
Txrep problem
Hello all! I have tried to implement TxRep into my system. My configuration for it is # Enable awl user_awl_dsnDBI:mysql:spamassassin:spamassassin user_awl_sql_username spamassassin user_awl_sql_password amazing use_txrep 1 My v341.pre says # TxRep - Reputation database that replaces AWL loadplugin Mail::SpamAssassin::Plugin::TxRep Spamassassin -D —lint tells no problems. I have a database in MySQL named as ”spamassassin” and there I have table txrep as +--+--+--+-+-+---+ | Field| Type | Null | Key | Default | Extra | +--+--+--+-+-+---+ | username | varchar(100) | NO | PRI | | | | email| varchar(255) | NO | PRI | | | | ip | varchar(40) | NO | PRI | | | | count| int(11) | NO | | 0 | | | totscore | float| NO | | 0 | | | signedby | varchar(255) | NO | PRI | | | +--+--+--+-+-+---+ 6 rows in set (0.00 sec) The table is empty! And in addition to that I today saw a spam that was pretty hammy, but had a score from TxRep as 8 points and It was marked as spam. What gave that score and why I do not get anything into table txrep? The table txrep is the only table in MariaDB database spamassassin, as my bayes is in Redis. Thanks, jarif
openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2
Just ran sa-update using gnupg2 and got channel: SHA512 verification failed, channel failed Why did that happen? -- Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising! https://www.empire.kred/ROOTNK?t=94a1f39b Look at Psalms 14 and 53 on Atheism sMerry Christmas 2018 and Happy New Year 2019!!
Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2
On 30 Nov 2018, at 15:17, The Doctor wrote: Just ran sa-update using gnupg2 and got channel: SHA512 verification failed, channel failed Why did that happen? Because the SHA512 verification of an update file failed, causing the channel to fail. Just like it says. If you give sa-update the "-D" option, you will get a verbose description of everything sa-update is doing, which will make more useful details regarding the failure available. There is even a strong chance that a second attempt will not fail, since some known failure modes are inherently transient. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Available For Hire: https://linkedin.com/in/billcole
Re: spoofing mail
Although the RFC allows muas not to include the mid, the same RFC does not mandate mtas to accept them. Since 100% of such emails on our records are spam, then we reject them upfront. I understand that spammers and scummers hate our policy, but hey, who cares, right? Our inbox, our rules. On Fri, Nov 30, 2018 at 10:06, Matus UHLAR - fantomas wrote: > On 29.11.18 09:30, Rupert Gallagher wrote: >>Message-ID and To have the same domain, but From does not. You should have >> never received that mail. > > this happens when message-id is added by mailserver of the recipient. > Should hit MSGID_FROM_MTA_HEADER. > > And, yes, there could be rule that catches message-id added by internal > server. Note that: > - Message-ID is not required (has SHOULD in RFC) > - many mailservers add message-id if it doesn't exist. > >>On Wed, Nov 28, 2018 at 19:15, Rick Gutierrez wrote: >> >>> El mié., 28 nov. 2018 a las 6:03, Christian Grunfeld >>> () escribió: Hi, this is a logcould you paste the email headers? cheers >>> I do not know if it is useful, the amavisd + spamassassin I have it in >>> front of the mail server. >>> >>> https://pastebin.com/ktMUDLps > > not available anymore :-( > -- > Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ > Warning: I wish NOT to receive e-mail advertising to this address. > Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. > The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95
Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2
On Fri, Nov 30, 2018 at 04:08:36PM -0500, Bill Cole wrote: > On 30 Nov 2018, at 15:17, The Doctor wrote: > > > Just ran sa-update using gnupg2 > > > > and got > > > > channel: SHA512 verification failed, channel failed > > > > Why did that happen? > > Because the SHA512 verification of an update file failed, causing the > channel to fail. Just like it says. > > If you give sa-update the "-D" option, you will get a verbose > description of everything sa-update is doing, which will make more > useful details regarding the failure available. There is even a strong > chance that a second attempt will not fail, since some known failure > modes are inherently transient. > I will stick with what you said sa-update -D Nov 30 14:53:12.329 [74107] dbg: logger: adding facilities: all Nov 30 14:53:12.329 [74107] dbg: logger: logging level is DBG Nov 30 14:53:12.329 [74107] dbg: generic: SpamAssassin version 3.4.2 Nov 30 14:53:12.329 [74107] dbg: generic: Perl 5.026002, PREFIX=/usr/local, DEF_RULES_DIR=/usr/local/share/spamassassin, LOCAL_RULES_DIR=/usr/local/etc/mail/spamassassin, LOCAL_STATE_DIR=/var/db/spamassassin Nov 30 14:53:12.329 [74107] dbg: config: timing enabled Nov 30 14:53:12.334 [74107] dbg: config: score set 0 chosen. Nov 30 14:53:12.349 [74107] dbg: generic: sa-update version 3.4.2 / svn1840377 Nov 30 14:53:12.349 [74107] dbg: generic: using update directory: /var/db/spamassassin/3.004002 Nov 30 14:53:12.770 [74107] dbg: diag: perl platform: 5.026002 freebsd Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Digest::SHA, version 5.96 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: HTML::Parser, version 3.72 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Net::DNS, version 1.19 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: NetAddr::IP, version 4.079 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Time::HiRes, version 1.9741 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Archive::Tar, version 2.24 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: IO::Zlib, version 1.10 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Digest::SHA1, version 2.13 Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: MIME::Base64, version 3.15 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: DB_File, version 1.84 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::SMTP, version 3.10 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Mail::SPF, version v2.009 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Geo::IP, version 1.51 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::CIDR::Lite, version 0.21 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Razor2::Client::Agent, version 2.84 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: IO::Socket::IP, version 0.38 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: IO::Socket::INET6, version 2.72 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: IO::Socket::SSL, version 2.060 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Compress::Zlib, version 2.074 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Mail::DKIM, version 0.54 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: DBI, version 1.642 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Getopt::Long, version 2.49 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: LWP::UserAgent, version 6.36 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: HTTP::Date, version 6.02 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Encode::Detect::Detector, version 1.01 Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::Patricia, version 1.22 Nov 30 14:53:12.772 [74107] dbg: diag: [...] module installed: Net::DNS::Nameserver, version 1692 Nov 30 14:53:12.772 [74107] dbg: diag: [...] module installed: BSD::Resource, version 1.2911 Nov 30 14:53:12.773 [74107] dbg: gpg: Searching for 'gpg' Nov 30 14:53:12.774 [74107] dbg: util: current PATH is: /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin Nov 30 14:53:12.774 [74107] dbg: util: executable for gpg was found at /usr/local/bin/gpg Nov 30 14:53:12.774 [74107] dbg: gpg: found /usr/local/bin/gpg Nov 30 14:53:12.782 [74107] dbg: gpg: importing default keyring to /usr/local/etc/mail/spamassassin/sa-update-keys Nov 30 14:53:12.797 [74107] dbg: gpg: [GNUPG:] IMPORT_OK 0 5E541DC959CB8BAC7C78DFDC4056A61A5244EC45 Nov 30 14:53:12.797 [74107] dbg: gpg: [GNUPG:] IMPORT_RES 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Nov 30 14:53:12.797 [74107] dbg: gpg: release trusted key id list: 0C2B1D7175B852C64B3CDC716C55397824F434CE 5E541DC959CB8BAC7C78DFDC4056A61A5244EC45 Nov 30 14:53:12.808 [74107] dbg: util: secure_tmpfile created a temporary file /tmp/.spamassassin74107JqCXOVtmp Nov 30 14:53:12.808 [7
Re: Bayes underperforming, HTML entities?
On Nov 30, 2018, at 7:00 AM, Bill Cole wrote: > >> Since HTML is already getting rendered to text, then perhaps the conversion >> code should strip (literally, just delete) any zero-width characters during >> this conversion? That should make normal body rules, and Bayes, function >> properly, no? > > Not if they are *looking for* those characters. But AFAIK we're only looking for those characters with rawbody rules, because it's really hard to search for them in regular body rules... no? I'm not trying to advocate for removal of rawbody rules, but rather making it easier for normal body rules to work. But RW's suggestion is probably a good one: offer both paths: On Nov 30, 2018, at 7:46 AM, RW wrote: > > It make it harder to write rules detecting these tricks, but it may > happen eventually. As far as Bayes is concerned, it would be a shame to > lose the information. I'm not sure I see how Bayes can take decent advantage out of these zero-width chars. If they are interspersed randomly within words, then Bayes has to tokenize each and every permutation (or, at least, very many permutations) of each word in order to be decently effective. But if the zero-width chars are stripped out, then Bayes only has to tokenize the regular, displayable word. Am I missing something? But offering both converted and non-converted options is likely the best option, and then having Bayes work on the normalized version resolves the above. --- Amir
Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2
On 30 Nov 2018, at 16:57, The Doctor wrote: On Fri, Nov 30, 2018 at 04:08:36PM -0500, Bill Cole wrote: On 30 Nov 2018, at 15:17, The Doctor wrote: Just ran sa-update using gnupg2 and got channel: SHA512 verification failed, channel failed Why did that happen? Because the SHA512 verification of an update file failed, causing the channel to fail. Just like it says. If you give sa-update the "-D" option, you will get a verbose description of everything sa-update is doing, which will make more useful details regarding the failure available. There is even a strong chance that a second attempt will not fail, since some known failure modes are inherently transient. I will stick with what you said sa-update -D [...] Looks normal until near the end: Nov 30 14:53:15.964 [74107] dbg: channel: selected mirror http://sa-update.spamassassin.org Nov 30 14:53:15.964 [74107] dbg: http: url: http://sa-update.spamassassin.org/1847701.tar.gz Nov 30 14:53:15.964 [74107] dbg: http: downloading to: /var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz, update Nov 30 14:53:15.964 [74107] dbg: util: executable for curl was found at /usr/local/bin/curl Nov 30 14:53:15.965 [74107] dbg: http: /usr/local/bin/curl -s -L -O --remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 --fail -o 1847701.tar.gz -z 1847701.tar.gz -- http://sa-update.spamassassin.org/1847701.tar.gz Nov 30 14:53:18.418 [74107] dbg: http: process [74232], exit status: exit 0 Nov 30 14:53:18.420 [74107] dbg: http: url: http://sa-update.spamassassin.org/1847701.tar.gz.sha512 Nov 30 14:53:18.420 [74107] dbg: http: downloading to: /var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.sha512, update Nov 30 14:53:18.421 [74107] dbg: util: executable for curl was found at /usr/local/bin/curl Nov 30 14:53:18.421 [74107] dbg: http: /usr/local/bin/curl -s -L -O --remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 --fail -o 1847701.tar.gz.sha512 -z 1847701.tar.gz.sha512 -- http://sa-update.spamassassin.org/1847701.tar.gz.sha512 Nov 30 14:53:20.259 [74107] dbg: http: process [74286], exit status: exit 0 Nov 30 14:53:20.260 [74107] dbg: http: url: http://sa-update.spamassassin.org/1847701.tar.gz.sha256 Nov 30 14:53:20.260 [74107] dbg: http: downloading to: /var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.sha256, update Nov 30 14:53:20.260 [74107] dbg: util: executable for curl was found at /usr/local/bin/curl Nov 30 14:53:20.260 [74107] dbg: http: /usr/local/bin/curl -s -L -O --remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 --fail -o 1847701.tar.gz.sha256 -z 1847701.tar.gz.sha256 -- http://sa-update.spamassassin.org/1847701.tar.gz.sha256 Nov 30 14:53:22.161 [74107] dbg: http: process [74329], exit status: exit 0 Nov 30 14:53:22.162 [74107] dbg: http: url: http://sa-update.spamassassin.org/1847701.tar.gz.asc Nov 30 14:53:22.162 [74107] dbg: http: downloading to: /var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.asc, update Nov 30 14:53:22.163 [74107] dbg: util: executable for curl was found at /usr/local/bin/curl Nov 30 14:53:22.163 [74107] dbg: http: /usr/local/bin/curl -s -L -O --remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 --fail -o 1847701.tar.gz.asc -z 1847701.tar.gz.asc -- http://sa-update.spamassassin.org/1847701.tar.gz.asc Nov 30 14:53:23.603 [74107] dbg: http: process [74380], exit status: exit 0 Nov 30 14:53:23.607 [74107] dbg: sha512: verification wanted: ae6c6249e8a63d4512331ec91e42bf0ba6ead2f8ba323200ebbfe4ed44bf9902635c7ecc7a3b392bdaddc96f070f8fd0293475dace317923854a32ba5238d93d That's the content of the downloaded 1847701.tar.gz.sha512 file, which is the SHA512 hash of the 1847701.tar.gz on the update servers. It matches the content of the same file that I just retrieved from the update server, so your transfer of that file worked. Nov 30 14:53:23.607 [74107] dbg: sha512: verification result: 88fd9fa22e55c00365b8d0548a7ce8fc8c5ac08c339ca383663b5b735337b2ef2a52a83021b6608f186b4163556a8b8d9ecef14c775717294607925577a0dd9f That's the actual SHA512 hash of the downloaded 1847701.tar.gz file. Obviously it does not match the hash of that file on the server, so there was something wrong with the download. I've just downloaded 1847701.tar.gz myself from the same server and verified that my downloaded DID verify, unpack correctly, and match what sa-update installed for me last night, so the problem is not with the files on the server but rather specifically with the download process or storage on your system resulting in a corrupted 1847701.tar.gz file. When the channel fails to verify, sa-update refrains from deleting any downloaded files for an update channel if the channel fails. As indicated above, those were all downloaded to /var/db/spamassassin/3.004002/updates_spamassassin_org/ and so should still be present. Check the size of the 1847701.tar.gz file
Re: Bayes underperforming, HTML entities?
On 30 Nov 2018, at 17:49, Amir Caspi wrote: On Nov 30, 2018, at 7:00 AM, Bill Cole wrote: Since HTML is already getting rendered to text, then perhaps the conversion code should strip (literally, just delete) any zero-width characters during this conversion? That should make normal body rules, and Bayes, function properly, no? Not if they are *looking for* those characters. But AFAIK we're only looking for those characters with rawbody rules, Not so. because it's really hard to search for them in regular body rules... no? No. See the relevant rule cluster (all with 'ZW' in their names) in KAM.cf and __UNICODE_OBFU_ZW in the standard ruleset. Also see my more generic (but still useful!) __SCC_SHORT_WORDS and derivatives in KAM.cf: it is a body rule that takes advantage of the fact that zero-width typographical control characters create logical word breaks as far as Perl is concerned. -- Bill Cole
Re: Bayes underperforming, HTML entities?
On Fri, 30 Nov 2018 15:49:31 -0700 Amir Caspi wrote: > > It make it harder to write rules detecting these tricks, but it may > > happen eventually. As far as Bayes is concerned, it would be a > > shame to lose the information. > > I'm not sure I see how Bayes can take decent advantage out of these > zero-width chars. If they are interspersed randomly within words, > then Bayes has to tokenize each and every permutation (or, at least, > very many permutations) of each word in order to be decently > effective. But if the zero-width chars are stripped out, then Bayes > only has to tokenize the regular, displayable word. Am I missing > something? Yes, you need something in between. A tokenization that avoids learning the hundreds of obfuscation variants, but doesn't throw away the existence of obfuscation. > But offering both converted and non-converted options is likely the > best option, and then having Bayes work on the normalized version > resolves the above. Not simply on the normalized text, that way you lose information. In the example I gave, the word: has would get tokenized twice, once through the body and once through the list of obfuscated words in the pseudo-header, producing the tokens: 'has' 'HX-Obfuscated-Norm:has' the former token would likely be neutral and drop out, but the second would probably only appear in spam. The upshot of this is that invisible obfuscation: - no longer breaks body rules - is easier for Bayes to learn than non-obfuscated text - can still be tested via X-Obfuscated-Orig without the complexity of rawbody
Re: spoofing mail
On Fri, 30 Nov 2018, Rupert Gallagher wrote: Although the RFC allows muas not to include the mid, the same RFC does not mandate mtas to accept them. Since 100% of such emails on our records are spam, then we reject them upfront. ...and if you're adopting that policy, the configure your MTA to reject messages missing a Message-ID during the SMTP phase before it ever touches SA. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- 610 days since the first commercial re-flight of an orbital booster (SpaceX)