Hi Micah,

On Wed, 16 Mar 2022, Micah Snyder (micasnyd) wrote:

(1) a plea for a way to test rules before they go live;

If you mean "for personal use" then I'd say, "What Maarten said."

Er, no.  Not "scan to make sure it detects things".  What I meant was
"do something to make sure it won't e.g. crash clamd when it tries to
scan something after this rule has been loaded" - but see (2) below.

(2) another plea for a parser which is good at its job;

I'm not sure what you mean here.  Can you elaborate?  If you simply
want ClamAV ignore garbage rules on load and continue with the rest
of the file (see point #4) - that's something we can easily improve
regardless of what we do. And that's how our yara rule loading logic
works right now.

I strongly feel that if it finds a problem, rather than silently load
some sub-optimal ruleset the parser should abandon the reload of the
entire ruleset.  Obviously it should warn when it does that.  I guess
this might be an issue if it's running on a machine with too little
RAM to reload while simultaneously scanning with the previous ruleset,
but something like a --test-ruleset option could probably handle that.

The following is from something I was doing back in June 2021, so it's
before your pull request of 2021.08.21:

https://github.com/Cisco-Talos/clamav/pull/261

and I haven't retested, but these are the sorts of things that were
driving me crazy around the middle of last year:

8<----------------------------------------------------------------------
$ diff -U3 Garbage_Rules.yar.~140~.NBG Garbage_Rules.yar.~141~.OK --- Garbage_Rules.yar.~140~.NBG 2021-06-13 08:05:54.218256634 +0100
+++ Garbage_Rules.yar.~141~.OK  2021-06-13 08:08:11.025783287 +0100
@@ -30,7 +30,7 @@
  strings:
    $ = /update from GOV.{1,10}UK/ ascii nocase
  condition:
-   any of them and not Blacklist_1
+   not Blacklist_1 and any of them
 }
8<----------------------------------------------------------------------
$ cat does_not_notice_missing_curly_brace private rule Email_marketing
{
  strings:
    $ = "email marketing" ascii nocase  // Testing
  condition:
    any of them
}

// Test private rule
rule Garbage_spam_testing_Rule
  strings:
    $TLD_4_to_20_chars = /htt(p|ps):\/\/[-a-z0-9]{3,50}\.[a-z]{4,20}\/./ ascii 
nocase
    $ = "email marketing" ascii nocase
  condition:
    all of them
}
8<----------------------------------------------------------------------
$ cat does_not_notice_missing_dollar_symbol --- Garbage_Rules.yar.~297~ 2021-07-30 14:43:26.540758502 +0100
+++ Garbage_Rules.yar   2021-07-30 14:46:30.277470587 +0100
@@ -29,7 +29,7 @@
rule test_single_string
{
  strings:
-   = /cc.{1,3}ab...@jubileegroup.co.uk/ ascii nocase
+ $ = /cc.{1,3}ab...@jubileegroup.co.uk/ ascii nocase
  condition:
    any of them
}
8<----------------------------------------------------------------------
$ cat five_more_yara_bugs See Garbage rules of late June to early July 2021.

1. It doesn't notice if you have more than one string with the same name.

2. If you have a string with a name that isn't referenced in the condition, it 
crashes.

3. It crashes if you mistakenly write (see 199-200) something like
    condition:
        Spam_trap and ( any of ($spammer_*) or any of ($warning_*) or (#publish_* 
> 4) )

4. It crashes if you mistakenly write something like .*{range} (for example see 
200-201)
        $ = /we.*{1,50}(sell|sale)/ ascii nocase
   which should be
        $ = /we.{1,50}(sell|sale)/ ascii nocase

5. If you want to match "Alfreton, Derbyshire" the string "Alfreton, 
Derbyshire" *does*
   match if you use the form

        $ = "Alfreton, Derbyshire" ascii nocase

   but it does *not* match if you use the form

        $ = "Alfreton, Derbyshire" ascii
8<----------------------------------------------------------------------

See also e.g.

https://bugzilla.clamav.net/show_bug.cgi?id=12095

While I was looking at this I also came upon another quirk that can be
a bit of a nuisance.  AFAICT Yara strings can only be delimited by one
of two characters, either a double-quote (for a literal string) or a
forward-slash (for a regex).  It would help to be able to choose the
quote character like in Perl; if not, at least having more available
to choose from could make many expressions more readable, especially
those which target e.g. HTML and links in mail (both of which tend to
have many occurrences of double-quote or forward-slash characters).

(3) a way to specify that a rule is to match in
    (a) mail headers only or
    (b) mail body only or
    (c) both;

This is a neat idea.  It is a new signature language feature request
and is a great example of something that would be hard to implement
in the current clamav signature language(s).  If you have any ideas
on how this may be expressed either in the "clamav yara extensions"
idea or in the proposed "KDL-based signature language" or some other
proposed format, I'd love some examples.

In Yara, something like

rule only_match_RFQ_if_found_in_subject_header
{
    strings:
        $a = /^Subject:\s*RFQ.*$/
    condition:
        mail_header and any of them
}

should do it.  The 'mail_header' condition would mean "We're scanning
an RFC5322 mail message and additionally this match looks only at the
bit before the first blank line in the message text" - what in RFC6522
is called the "Full Header Section" (section 2, page 7).  Of course a
'mail_body' condition would mean "... and additionally this match only
considers the bit *after* the first blank line in the message text" or
in the RFC6522 definition, the "Full Body".  It's very simple to split
a mail message into these two parts.  The delimiter is just the first
blank line in the text.  The rule wouldn't match at all if we aren't
scanning something which ClamAV has decided is a mail message.

(4) it would be great to have a way to reload rulesets separately so
it isn't necessary to reload ten million signatures when you've only
added one Yara rule, only then to find clamd crashes the first time it
tries to scan anything because you broke that rule.  I understand this
might be asking a lot ...

Asking Clam to load additional rules to an existing engine while
scans are ongoing is tricky, but potentially​​​ doable.  It throws a
wrench into the works for some hardening ideas I'm proposing ...

As I said, I understand it might be asking a lot.  I hadn't considered
that it might militate against hardening, and I think hardening should
take priority.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to