Re: spoofing mail

2018-11-30 Thread Matus UHLAR - fantomas

On 29.11.18 09:30, Rupert Gallagher wrote:

Message-ID and To have the same domain, but From does not.  You should have
never received that mail.


this happens when message-id is added by mailserver of the recipient.
Should hit MSGID_FROM_MTA_HEADER.

And, yes, there could be rule that catches message-id added by internal
server. Note that:
- Message-ID is not required (has SHOULD in RFC)
- many mailservers add message-id if it doesn't exist.


On Wed, Nov 28, 2018 at 19:15, Rick Gutierrez  wrote:


El mié., 28 nov. 2018 a las 6:03, Christian Grunfeld
() escribió:


Hi,

this is a logcould you paste the email headers?

cheers


I do not know if it is useful, the amavisd + spamassassin I have it in
front of the mail server.

https://pastebin.com/ktMUDLps


not available anymore :-(
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95


Re: Bayes underperforming, HTML entities?

2018-11-30 Thread RW
On Thu, 29 Nov 2018 22:33:12 -0700
Amir Caspi wrote:

> On Nov 29, 2018, at 10:11 PM, Bill Cole
>  wrote:
> > 
> > I have no issue with adding a new rule type to act on the output of
> > a partial well-defined HTML parsing, something in between 'rawbody'
> > and 'body' types, but overloading normalize_charset with that and
> > so affecting every existing rule of all body-oriented rule types
> > would be a bad design.  
> 
> The problem as I see it is that spammers are using HTML encoding as
> effectively another charset, and as a way of obfuscating like they
> did/do with Unicode lookalikes... but unless those HTML characters
> are translated there is no way to catch this obfuscation.

normalize_charset is about converting  text from whatever character set
it's in to UTF-8, and nothing else. SpamAssassin should already decode
HTML to text for body rules. Rules matching the HTML entities use
rawbody specifically to avoid having them converted to plain text.

The most substantial problem here is that these invisible characters
make it very hard to write ordinary body rules.


Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Amir Caspi
On Nov 30, 2018, at 6:09 AM, RW  wrote:
> 
> The most substantial problem here is that these invisible characters
> make it very hard to write ordinary body rules.

Thanks for the clarification on my confusion. Since HTML is already getting 
rendered to text, then perhaps the conversion code should strip (literally, 
just delete) any zero-width characters during this conversion? That should make 
normal body rules, and Bayes, function properly, no?

Is there a reason not to strip out zero-width characters? That is, is there any 
benefit or reason to maintain invisible chars versus throwing them out?

Thanks!

--- Amir


Re: --virtual-config-dir=pattern is not substituted

2018-11-30 Thread Bill Cole

On 29 Nov 2018, at 8:06, Eggert Ehmke wrote:

Strange, I am missing that configuration in /etc/postfix/master.cf. 
Will add

them.


Please be careful. It is *possible* to have SpamAssassin hooked into the 
mail acceptance and delivery flow in many different ways, I only 
(vaguely) described the most common way that I've seen for doing so with 
the "spamd" daemon involved, which is indicated by your use of 
'--virtual-config-dir' and a log file named 'spamd.log.'


If your configuration was completely missing any relevant entry in 
master.cf, it could be that SA was being used via Dovecot or via an 
access map FILTER result. Make sure you understand how your plumbing 
works before changing it.




Am Donnerstag, 29. November 2018, 01:15:39 CET schrieb Bill Cole:

On 28 Nov 2018, at 17:53, Eggert Ehmke wrote:

Do you mean the --username option in /etc/default/spamassassin?


No. Postfix is running the 'spamc' program in some fashion, usually 
via
a pipe transport configured in master.cf. That transport (typically 
an
intermediary script) needs to be passed the recipient address by 
Postfix

and may need to transform it in some fashion (e.g. strip the domain
maybe) to use it as the argument to the '-u' option in an invocation 
of

spamc.


It is set to the generic user --username=debian-spamd


Thank you

Am Mittwoch, 28. November 2018, 22:41:38 CET schrieb RW:

On Tue, 27 Nov 2018 18:01:04 +0100

Eggert Ehmke wrote:
I have Spamassassin running on Debian with Postfix, Dovecot etc. 
It

seems to work, Spam is filtered to my Quarantine. I have some
virtual
mailboxes in /var/mail/vhosts and have set up the Option

-x --virtual-config-dir=/var/mail/vhosts/%d/%l/spamassassin
This does not work, in the log file  
/var/log/spamassassin/spamd.log


I find these lines:

warn: plugin: eval failed: bayes: (in learn) locker: safe_lock:
cannot create tmp lockfile /var/
mail/vhosts///spamassassin/bayes.lock.domain.de.3653
for /var/mail/vhosts///spa


So the user name and the domain are  not replaced in the pattern.
What may be wrong??


Are you sure the recipient address is being passed to spamc via the
-u
option?



--
Bill Cole


Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Bill Cole

On 30 Nov 2018, at 8:29, Amir Caspi wrote:


On Nov 30, 2018, at 6:09 AM, RW  wrote:


The most substantial problem here is that these invisible characters
make it very hard to write ordinary body rules.


Thanks for the clarification on my confusion. Since HTML is already 
getting rendered to text, then perhaps the conversion code should 
strip (literally, just delete) any zero-width characters during this 
conversion? That should make normal body rules, and Bayes, function 
properly, no?


Not if they are *looking for* those characters.

Is there a reason not to strip out zero-width characters? That is, is 
there any benefit or reason to maintain invisible chars versus 
throwing them out?


The presence of zero-width characters is a very strong spam indicator. 
It isn't quite perfect however, since at least one procedurally 
legitimate and rather popular US entity is sending mail that people 
affirmatively want to receive like this: 
https://www.scconsult.com/atkspam.txt


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Bayes underperforming, HTML entities?

2018-11-30 Thread RW
On Fri, 30 Nov 2018 06:29:31 -0700
Amir Caspi wrote:

> On Nov 30, 2018, at 6:09 AM, RW  wrote:
> > 
> > The most substantial problem here is that these invisible characters
> > make it very hard to write ordinary body rules.  
> 
> Thanks for the clarification on my confusion. Since HTML is already
> getting rendered to text, then perhaps the conversion code should
> strip (literally, just delete) any zero-width characters during this
> conversion? That should make normal body rules, and Bayes, function
> properly, no?
> 
> Is there a reason not to strip out zero-width characters? That is, is
> there any benefit or reason to maintain invisible chars versus
> throwing them out?

It make it harder to write rules detecting these tricks, but it may
happen eventually. As far as Bayes is concerned, it would be a shame to
lose the information.

What I think might be a good compromise is to normalize out all
invisible and high quality obfuscations, but add the original and
normalized  words to two metadata headers.

So, if  represent a homoglyph for 'a' and  is an invisible
character, then the text

   my mlware has copied your address book

would be converted to 

   my malware has copied your address book

with the generation of

X-Obfuscated-Orig: mlware has  address
X-Obfuscated-Norm: malware has address

It would be possible to run headers rules against either pseudo header.

Bayes would ignore X-Obfuscated-Orig and tokenize X-Obfuscated-Norm with
a dedicated prefix. Most common English works from that header would be
strongly spammy.


Re: spoofing mail

2018-11-30 Thread Rick Gutierrez
El vie., 30 nov. 2018 a las 3:06, Matus UHLAR - fantomas
() escribió:

> And, yes, there could be rule that catches message-id added by internal
> server. Note that:
> - Message-ID is not required (has SHOULD in RFC)
> - many mailservers add message-id if it doesn't exist.
>

> >>
> >> https://pastebin.com/ktMUDLps
>
> not available anymore :-(
> --


Hi , here it is https://pastebin.com/3TtsjXSX

last trace ,  after my gateway analyzes it

https://pastebin.com/76rNVnnp


-- 
rickygm

http://gnuforever.homelinux.com


Txrep problem

2018-11-30 Thread Jari Fredriksson
Hello all!

I have tried to implement TxRep into my system.

My configuration for it is

# Enable awl
user_awl_dsnDBI:mysql:spamassassin:spamassassin
user_awl_sql_username   spamassassin
user_awl_sql_password   amazing

use_txrep 1


My v341.pre says

# TxRep - Reputation database that replaces AWL
loadplugin Mail::SpamAssassin::Plugin::TxRep

Spamassassin -D —lint tells no problems.

I have a database in MySQL named as ”spamassassin” and there I have table txrep 
as

+--+--+--+-+-+---+
| Field| Type | Null | Key | Default | Extra |
+--+--+--+-+-+---+
| username | varchar(100) | NO   | PRI | |   |
| email| varchar(255) | NO   | PRI | |   |
| ip   | varchar(40)  | NO   | PRI | |   |
| count| int(11)  | NO   | | 0   |   |
| totscore | float| NO   | | 0   |   |
| signedby | varchar(255) | NO   | PRI | |   |
+--+--+--+-+-+---+
6 rows in set (0.00 sec)

The table is empty!

And in addition to that I today saw a spam that was pretty hammy, but had a 
score from TxRep as 8 points and It was marked as spam. What gave that score 
and why I do not get anything into table txrep?

The table txrep is the only table in MariaDB database spamassassin, as my bayes 
is in Redis.

Thanks,  jarif  

openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2

2018-11-30 Thread The Doctor
Just ran sa-update  using gnupg2

and got

channel: SHA512 verification failed, channel failed

Why did that happen?

-- 
Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca
Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising!
https://www.empire.kred/ROOTNK?t=94a1f39b  Look at Psalms 14 and 53 on Atheism
sMerry Christmas 2018 and Happy New Year 2019!!


Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2

2018-11-30 Thread Bill Cole

On 30 Nov 2018, at 15:17, The Doctor wrote:


Just ran sa-update  using gnupg2

and got

channel: SHA512 verification failed, channel failed

Why did that happen?


Because the SHA512 verification of an update file failed, causing the 
channel to fail. Just like it says.


If you give sa-update the "-D" option, you will get a verbose 
description of everything sa-update is doing, which will make more 
useful details regarding the failure available. There is even a strong 
chance that a second attempt will not fail, since some known failure 
modes are inherently transient.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: spoofing mail

2018-11-30 Thread Rupert Gallagher
Although the RFC allows muas not to include the mid, the same RFC does not 
mandate mtas to accept them. Since 100% of such emails on our records are spam, 
then we reject them upfront. I understand that spammers and scummers hate our 
policy, but hey, who cares, right? Our inbox, our rules.

On Fri, Nov 30, 2018 at 10:06, Matus UHLAR - fantomas  wrote:

> On 29.11.18 09:30, Rupert Gallagher wrote:
>>Message-ID and To have the same domain, but From does not. You should have
>> never received that mail.
>
> this happens when message-id is added by mailserver of the recipient.
> Should hit MSGID_FROM_MTA_HEADER.
>
> And, yes, there could be rule that catches message-id added by internal
> server. Note that:
> - Message-ID is not required (has SHOULD in RFC)
> - many mailservers add message-id if it doesn't exist.
>
>>On Wed, Nov 28, 2018 at 19:15, Rick Gutierrez  wrote:
>>
>>> El mié., 28 nov. 2018 a las 6:03, Christian Grunfeld
>>> () escribió:

 Hi,

 this is a logcould you paste the email headers?

 cheers

>>> I do not know if it is useful, the amavisd + spamassassin I have it in
>>> front of the mail server.
>>>
>>> https://pastebin.com/ktMUDLps
>
> not available anymore :-(
> --
> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2

2018-11-30 Thread The Doctor
On Fri, Nov 30, 2018 at 04:08:36PM -0500, Bill Cole wrote:
> On 30 Nov 2018, at 15:17, The Doctor wrote:
> 
> > Just ran sa-update  using gnupg2
> >
> > and got
> >
> > channel: SHA512 verification failed, channel failed
> >
> > Why did that happen?
> 
> Because the SHA512 verification of an update file failed, causing the 
> channel to fail. Just like it says.
> 
> If you give sa-update the "-D" option, you will get a verbose 
> description of everything sa-update is doing, which will make more 
> useful details regarding the failure available. There is even a strong 
> chance that a second attempt will not fail, since some known failure 
> modes are inherently transient.
>

I will stick with what you said

sa-update -D
Nov 30 14:53:12.329 [74107] dbg: logger: adding facilities: all
Nov 30 14:53:12.329 [74107] dbg: logger: logging level is DBG
Nov 30 14:53:12.329 [74107] dbg: generic: SpamAssassin version 3.4.2
Nov 30 14:53:12.329 [74107] dbg: generic: Perl 5.026002, PREFIX=/usr/local, 
DEF_RULES_DIR=/usr/local/share/spamassassin, 
LOCAL_RULES_DIR=/usr/local/etc/mail/spamassassin, 
LOCAL_STATE_DIR=/var/db/spamassassin
Nov 30 14:53:12.329 [74107] dbg: config: timing enabled
Nov 30 14:53:12.334 [74107] dbg: config: score set 0 chosen.
Nov 30 14:53:12.349 [74107] dbg: generic: sa-update version 3.4.2 / svn1840377
Nov 30 14:53:12.349 [74107] dbg: generic: using update directory: 
/var/db/spamassassin/3.004002
Nov 30 14:53:12.770 [74107] dbg: diag: perl platform: 5.026002 freebsd
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Digest::SHA, 
version 5.96
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: HTML::Parser, 
version 3.72
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Net::DNS, 
version 1.19
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: NetAddr::IP, 
version 4.079
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Time::HiRes, 
version 1.9741
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Archive::Tar, 
version 2.24
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: IO::Zlib, 
version 1.10
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: Digest::SHA1, 
version 2.13
Nov 30 14:53:12.770 [74107] dbg: diag: [...] module installed: MIME::Base64, 
version 3.15
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: DB_File, version 
1.84
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::SMTP, 
version 3.10
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Mail::SPF, 
version v2.009
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Geo::IP, version 
1.51
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::CIDR::Lite, 
version 0.21
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: 
Razor2::Client::Agent, version 2.84
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: IO::Socket::IP, 
version 0.38
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: 
IO::Socket::INET6, version 2.72
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: IO::Socket::SSL, 
version 2.060
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Compress::Zlib, 
version 2.074
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Mail::DKIM, 
version 0.54
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: DBI, version 
1.642
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Getopt::Long, 
version 2.49
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: LWP::UserAgent, 
version 6.36
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: HTTP::Date, 
version 6.02
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: 
Encode::Detect::Detector, version 1.01
Nov 30 14:53:12.771 [74107] dbg: diag: [...] module installed: Net::Patricia, 
version 1.22
Nov 30 14:53:12.772 [74107] dbg: diag: [...] module installed: 
Net::DNS::Nameserver, version 1692
Nov 30 14:53:12.772 [74107] dbg: diag: [...] module installed: BSD::Resource, 
version 1.2911
Nov 30 14:53:12.773 [74107] dbg: gpg: Searching for 'gpg'
Nov 30 14:53:12.774 [74107] dbg: util: current PATH is: 
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin
Nov 30 14:53:12.774 [74107] dbg: util: executable for gpg was found at 
/usr/local/bin/gpg
Nov 30 14:53:12.774 [74107] dbg: gpg: found /usr/local/bin/gpg
Nov 30 14:53:12.782 [74107] dbg: gpg: importing default keyring to 
/usr/local/etc/mail/spamassassin/sa-update-keys
Nov 30 14:53:12.797 [74107] dbg: gpg: [GNUPG:] IMPORT_OK 0 
5E541DC959CB8BAC7C78DFDC4056A61A5244EC45
Nov 30 14:53:12.797 [74107] dbg: gpg: [GNUPG:] IMPORT_RES 1 0 0 0 1 0 0 0 0 0 0 
0 0 0 0
Nov 30 14:53:12.797 [74107] dbg: gpg: release trusted key id list: 
0C2B1D7175B852C64B3CDC716C55397824F434CE 
5E541DC959CB8BAC7C78DFDC4056A61A5244EC45
Nov 30 14:53:12.808 [74107] dbg: util: secure_tmpfile created a temporary file 
/tmp/.spamassassin74107JqCXOVtmp
Nov 30 14:53:12.808 [7

Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Amir Caspi
On Nov 30, 2018, at 7:00 AM, Bill Cole 
 wrote:
> 
>> Since HTML is already getting rendered to text, then perhaps the conversion 
>> code should strip (literally, just delete) any zero-width characters during 
>> this conversion? That should make normal body rules, and Bayes, function 
>> properly, no?
> 
> Not if they are *looking for* those characters.

But AFAIK we're only looking for those characters with rawbody rules, because 
it's really hard to search for them in regular body rules... no?  I'm not 
trying to advocate for removal of rawbody rules, but rather making it easier 
for normal body rules to work.

But RW's suggestion is probably a good one: offer both paths:

On Nov 30, 2018, at 7:46 AM, RW  wrote:
> 
> It make it harder to write rules detecting these tricks, but it may
> happen eventually. As far as Bayes is concerned, it would be a shame to
> lose the information.

I'm not sure I see how Bayes can take decent advantage out of these zero-width 
chars.  If they are interspersed randomly within words, then Bayes has to 
tokenize each and every permutation (or, at least, very many permutations) of 
each word in order to be decently effective.  But if the zero-width chars are 
stripped out, then Bayes only has to tokenize the regular, displayable word.  
Am I missing something?

But offering both converted and non-converted options is likely the best 
option, and then having Bayes work on the normalized version resolves the above.

--- Amir



Re: openssl 1.1.1 , FreeBSd 11.2 and spamassassin-3.4.2_2

2018-11-30 Thread Bill Cole

On 30 Nov 2018, at 16:57, The Doctor wrote:


On Fri, Nov 30, 2018 at 04:08:36PM -0500, Bill Cole wrote:

On 30 Nov 2018, at 15:17, The Doctor wrote:


Just ran sa-update  using gnupg2

and got

channel: SHA512 verification failed, channel failed

Why did that happen?


Because the SHA512 verification of an update file failed, causing the
channel to fail. Just like it says.

If you give sa-update the "-D" option, you will get a verbose
description of everything sa-update is doing, which will make more
useful details regarding the failure available. There is even a 
strong

chance that a second attempt will not fail, since some known failure
modes are inherently transient.



I will stick with what you said

sa-update -D


[...]

Looks normal until near the end:

Nov 30 14:53:15.964 [74107] dbg: channel: selected mirror 
http://sa-update.spamassassin.org
Nov 30 14:53:15.964 [74107] dbg: http: url: 
http://sa-update.spamassassin.org/1847701.tar.gz
Nov 30 14:53:15.964 [74107] dbg: http: downloading to: 
/var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz, 
update
Nov 30 14:53:15.964 [74107] dbg: util: executable for curl was found 
at /usr/local/bin/curl
Nov 30 14:53:15.965 [74107] dbg: http: /usr/local/bin/curl -s -L -O 
--remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 
--fail -o 1847701.tar.gz -z 1847701.tar.gz -- 
http://sa-update.spamassassin.org/1847701.tar.gz
Nov 30 14:53:18.418 [74107] dbg: http: process [74232], exit status: 
exit 0
Nov 30 14:53:18.420 [74107] dbg: http: url: 
http://sa-update.spamassassin.org/1847701.tar.gz.sha512
Nov 30 14:53:18.420 [74107] dbg: http: downloading to: 
/var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.sha512, 
update
Nov 30 14:53:18.421 [74107] dbg: util: executable for curl was found 
at /usr/local/bin/curl
Nov 30 14:53:18.421 [74107] dbg: http: /usr/local/bin/curl -s -L -O 
--remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 
--fail -o 1847701.tar.gz.sha512 -z 1847701.tar.gz.sha512 -- 
http://sa-update.spamassassin.org/1847701.tar.gz.sha512
Nov 30 14:53:20.259 [74107] dbg: http: process [74286], exit status: 
exit 0
Nov 30 14:53:20.260 [74107] dbg: http: url: 
http://sa-update.spamassassin.org/1847701.tar.gz.sha256
Nov 30 14:53:20.260 [74107] dbg: http: downloading to: 
/var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.sha256, 
update
Nov 30 14:53:20.260 [74107] dbg: util: executable for curl was found 
at /usr/local/bin/curl
Nov 30 14:53:20.260 [74107] dbg: http: /usr/local/bin/curl -s -L -O 
--remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 
--fail -o 1847701.tar.gz.sha256 -z 1847701.tar.gz.sha256 -- 
http://sa-update.spamassassin.org/1847701.tar.gz.sha256
Nov 30 14:53:22.161 [74107] dbg: http: process [74329], exit status: 
exit 0
Nov 30 14:53:22.162 [74107] dbg: http: url: 
http://sa-update.spamassassin.org/1847701.tar.gz.asc
Nov 30 14:53:22.162 [74107] dbg: http: downloading to: 
/var/db/spamassassin/3.004002/updates_spamassassin_org/1847701.tar.gz.asc, 
update
Nov 30 14:53:22.163 [74107] dbg: util: executable for curl was found 
at /usr/local/bin/curl
Nov 30 14:53:22.163 [74107] dbg: http: /usr/local/bin/curl -s -L -O 
--remote-time -g --max-redirs 2 --connect-timeout 30 --max-time 300 
--fail -o 1847701.tar.gz.asc -z 1847701.tar.gz.asc -- 
http://sa-update.spamassassin.org/1847701.tar.gz.asc
Nov 30 14:53:23.603 [74107] dbg: http: process [74380], exit status: 
exit 0
Nov 30 14:53:23.607 [74107] dbg: sha512: verification wanted: 
ae6c6249e8a63d4512331ec91e42bf0ba6ead2f8ba323200ebbfe4ed44bf9902635c7ecc7a3b392bdaddc96f070f8fd0293475dace317923854a32ba5238d93d


That's the content of the downloaded 1847701.tar.gz.sha512 file, which 
is the SHA512 hash of the 1847701.tar.gz on the update servers. It 
matches the content of the same file that I just retrieved from the 
update server, so your transfer of that file worked.


Nov 30 14:53:23.607 [74107] dbg: sha512: verification result: 
88fd9fa22e55c00365b8d0548a7ce8fc8c5ac08c339ca383663b5b735337b2ef2a52a83021b6608f186b4163556a8b8d9ecef14c775717294607925577a0dd9f


That's the actual SHA512 hash of the downloaded 1847701.tar.gz file. 
Obviously it does not match the hash of that file on the server, so 
there was something wrong with the download. I've just downloaded 
1847701.tar.gz myself from the same server and verified that my 
downloaded DID verify, unpack correctly, and match what sa-update 
installed for me last night, so the problem is not with the files on the 
server but rather specifically with the download process or storage on 
your system resulting in a corrupted 1847701.tar.gz file.


When the channel fails to verify, sa-update refrains from deleting any 
downloaded files for an update channel if the channel fails. As 
indicated above, those were all downloaded to 
/var/db/spamassassin/3.004002/updates_spamassassin_org/ and so should 
still be present. Check the size of the 1847701.tar.gz file

Re: Bayes underperforming, HTML entities?

2018-11-30 Thread Bill Cole

On 30 Nov 2018, at 17:49, Amir Caspi wrote:

On Nov 30, 2018, at 7:00 AM, Bill Cole 
 wrote:


Since HTML is already getting rendered to text, then perhaps the 
conversion code should strip (literally, just delete) any zero-width 
characters during this conversion? That should make normal body 
rules, and Bayes, function properly, no?


Not if they are *looking for* those characters.


But AFAIK we're only looking for those characters with rawbody rules,


Not so.

because it's really hard to search for them in regular body rules... 
no?


No.

See the relevant rule cluster (all with 'ZW' in their names) in KAM.cf 
and __UNICODE_OBFU_ZW in the standard ruleset.


Also see my more generic (but still useful!) __SCC_SHORT_WORDS and 
derivatives in KAM.cf: it is a body rule that takes advantage of the 
fact that zero-width typographical control characters create logical 
word breaks as far as Perl is concerned.





--
Bill Cole


Re: Bayes underperforming, HTML entities?

2018-11-30 Thread RW
On Fri, 30 Nov 2018 15:49:31 -0700
Amir Caspi wrote:

> > It make it harder to write rules detecting these tricks, but it may
> > happen eventually. As far as Bayes is concerned, it would be a
> > shame to lose the information.  
> 
> I'm not sure I see how Bayes can take decent advantage out of these
> zero-width chars.  If they are interspersed randomly within words,
> then Bayes has to tokenize each and every permutation (or, at least,
> very many permutations) of each word in order to be decently
> effective.  But if the zero-width chars are stripped out, then Bayes
> only has to tokenize the regular, displayable word.  Am I missing
> something?

Yes, you need something in between. A tokenization that avoids
learning the hundreds of obfuscation variants, but doesn't throw away
the existence of obfuscation. 

> But offering both converted and non-converted options is likely the
> best option, and then having Bayes work on the normalized version
> resolves the above.

Not simply on the normalized text, that way you lose information. In the
example I gave, the word:
 
  has 

would get tokenized twice, once through the body and once through
the list of obfuscated words in the pseudo-header, producing the tokens:

   'has'
   'HX-Obfuscated-Norm:has'

the former token would likely be neutral and drop out, but the second
would probably only appear in spam. 

The upshot of this is that invisible obfuscation:

- no longer breaks body rules
- is easier for Bayes to learn than non-obfuscated text
- can still be tested via X-Obfuscated-Orig without the
  complexity of rawbody


Re: spoofing mail

2018-11-30 Thread John Hardin

On Fri, 30 Nov 2018, Rupert Gallagher wrote:

Although the RFC allows muas not to include the mid, the same RFC does 
not mandate mtas to accept them. Since 100% of such emails on our 
records are spam, then we reject them upfront.


...and if you're adopting that policy, the configure your MTA to reject 
messages missing a Message-ID during the SMTP phase before it ever touches 
SA.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 610 days since the first commercial re-flight of an orbital booster (SpaceX)