Re: Unicode considered harmful again

2021-11-04 Thread Rupert Gallagher
 Original Message 
On Nov 4, 2021, 07:45, Damian < spamassas...@arcsin.de> wrote:

>> Please convert all source code to ASCII. If it fails to compile, then it may 
>> have a trojan hiding in Unicode clothing.

>Instructions unclear.

CVE 2021-42574

Re: Unicode considered harmful again

2021-11-04 Thread Damian
>> Please convert all source code to ASCII. If it fails to compile, 
then it may have a trojan hiding in Unicode clothing.


>Instructions unclear.

CVE 2021-42574


It remains unclear (to me). What source code should spamassassin-users 
convert? Attached source code in emails? How should they convert, is 
there a SpamAssassin-Plugin? Should they install compilers on their mail 
system?




Re: Unicode considered harmful again

2021-11-04 Thread Rupert Gallagher
 Original Message 
On Nov 4, 2021, 09:34, Damian < spamassas...@arcsin.de> wrote:
> >> Please convert all source code to ASCII. If it fails to compile,
> then it may have a trojan hiding in Unicode clothing.
>
> >Instructions unclear.
>
> CVE 2021-42574

> It remains unclear (to me). What source code should spamassassin-users 
> convert? Attached source code in emails? How should they convert, is there a 
> SpamAssassin-Plugin? Should they install compilers on their mail system?

The CVE is a call to action for the developers. On users, if SA can safely 
detect an attack, it should report it.

Re: Unicode considered harmful again

2021-11-04 Thread Bill Cole

On 2021-11-04 at 08:45:02 UTC-0400 (Thu, 4 Nov 2021 08:45:02 -0400)
Jared Hall 
is rumored to have said:

[...]

2) Beware of using somebody else's source code :)


That's the really significant warning...

The relevance to SA is that it uses a config system with "rules" that 
can be auto-updated and are which de facto source code: somebody else's 
source code. :)


We do not currently publish non-ASCII rules in the default ruleset 
channel. I don't believe that KAM ever does so. At least one 3rd-party 
ruleset has done so in the past, generating errors and warnings from 
some versions of Perl. Through 3.x, SA does not have conscious support 
for non-ASCII rules and while it is possible that SA could be vulnerable 
to something akin to CVE-2021-42574 and CVE-2021-42694 via malicious 
rules, it would be a noisy and rather difficult attack.


In v4.x, Unicode support will be better. That also means it may be 
easier to make this sort of attack quieter in the future, as non-ASCII 
rules won't be definitively wrong as they are now.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Unicode considered harmful again

2021-11-04 Thread Jared Hall

On 11/4/2021 10:44 AM, Bill Cole wrote:

On 2021-11-04 at 08:45:02 UTC-0400 (Thu, 4 Nov 2021 08:45:02 -0400)
Jared Hall 
is rumored to have said:

[...]

2) Beware of using somebody else's source code :)


That's the really significant warning...


Agreed.  Does one need to write a paper and publish a couple of CVEs for 
that?  I thought Mitre or whoever runs CVE nowadays would triage these 
types of reports through a "Captain Obvious" department to sort Wants 
from Needs.




We do not currently publish non-ASCII rules in the default ruleset 
channel. I don't believe that KAM ever does so.


KAM certainly has.  I do recall seeing at least an infinity symbol as 
well as the Euro symbol in his rulesets last I looked.  NBD, works 
anyway.  I crank out hex when dealing with Unicode, and I have tons of 
that.  I have a nice Unicode converter that works on strings.  One of 
these days I'll change it to parse entire files; Heinlein's stuff for 
instance.


In v4.x, Unicode support will be better. That also means it may be 
easier to make this sort of attack quieter in the future, as non-ASCII 
rules won't be definitively wrong as they are now.


I have my own thoughts/reservations about distributing Unicode 
rulesets.  Challenging days ahead, to be sure.  It'd sure be nice to get 
sa-compile to run entirely clean though.


Thanks,

-- Jared Hall


Re: timeouts on processing some messages, started October 24

2021-11-04 Thread Greg Troxel

I have captured a bad message.   It seems innocuous; it's from me at a
host in my domain, to me, basically

From: g...@foo.lexort.com
To: g...@lexort.com

and has a body "foo", no DKIM headers, just Received, Subject,
Message-Id.


Processing this with my normal config results in the timeout.


I noticed lockfiles for txrep, even though I couldn't figure out that
txrep was involved from' -D all', and turned off txrep in my config
("use_txrep 0" instead of 1).  Then, the message processes in 2s.

When I had txrep enabled, I saw a tx-reputation.lock with a single line
that was a pid of the spamd child process that was accumulating CPU
time.  I also had files like:
  tx-reputation.lock.bar.lexort.com.5023
where that was another pid, and this second file seemed to be
accumulating lines.

I did find a stray sa-learn from October and killed it.

Running my spam learning script, which just calls sa-learn with --spam
or --ham (and -L always) is turning out slow, probably from the same thing.

So it sort of smells like one of
  - something is wrong with my txrep database
  - some code is hitting O(n^k) or something
  - there is some strange locking/spinning behavior
  - something else I don't understand, as always
  


Does anyone have pointers to a database export/import script for txrep?


signature.asc
Description: PGP signature


Re: Unicode considered harmful again

2021-11-04 Thread Benny Pedersen

On 2021-11-04 09:34, Damian wrote:

>> Please convert all source code to ASCII. If it fails to compile, then it may 
have a trojan hiding in Unicode clothing.

>Instructions unclear.

CVE 2021-42574


It remains unclear (to me). What source code should spamassassin-users
convert? Attached source code in emails? How should they convert, is
there a SpamAssassin-Plugin? Should they install compilers on their
mail system?


https://bugs.gentoo.org/807781

not all 3dr party have clean rules with leds to that problem

==
$ perl -ne 'print "$. $_" if m/[\x80-\xFF]/' 
/var/lib/spamassassin/3.004006/updates_spamassassin_org/50_scores.cf

526 # Validity (née ReturnPath) Certified
==

i dont have tested if its solved in defeault rules now, but kam and ita 
channel still have it


we are all waiting for spamassassin 4.x


Re: Unicode considered harmful again

2021-11-04 Thread Loren Wilton
In v4.x, Unicode support will be better. That also means it may be easier 
to make this sort of attack quieter in the future, as non-ASCII rules 
won't be definitively wrong as they are now.


The question is whether non-ascii malicious rules could do anything more 
damaging than simply failing to match on the obvious strings "visible" in 
the rule, or alternately deliberately match on some string that should not 
be matched, in some form of DOS attempt.


It's hard to see how someone could inject Perl (or any other) code with 
screwy rules. There was a time Perl code was allowed in rules, that was 
disallowed many years ago:


   uri  LW_PRINTIT   /(^.*$)(?{ print "URI:\n$^N\nEnd URI\n\n" })/is

That was a real handy debugging rule once, but you can't get away with that 
anymore.


   Loren