Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-27 Thread Daniel P . Berrangé
On Tue, Oct 27, 2020 at 01:09:38PM +0100, Peter Krempa wrote:
> On Tue, Oct 27, 2020 at 10:04:33 +, Daniel Berrange wrote:
> > On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> > > On Mon, Oct 26, 2020 at 16:08:34 +, Daniel Berrange wrote:
> > > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > > > Glib's hash table provides basically the same functionality as our 
> > > > > hash
> > > > > table.
> > > > > 
> > > > > In most cases the only thing that remains in the virHash* wrappers is
> > > > > NULL-checks of '@table' argument as glib's hash functions don't 
> > > > > tolerate
> > > > > NULL.
> > > > > 
> > > > > In case of iterators, we adapt the existing API of iterators to glibs 
> > > > > to
> > > > > prevent having rewrite all callers at this point.
> > > > > 
> > > > > Signed-off-by: Peter Krempa 
> > > > > ---
> > > > >  src/libvirt_private.syms |   4 -
> > > > >  src/util/meson.build |   1 -
> > > > >  src/util/virhash.c   | 416 
> > > > > ++-
> > > > >  src/util/virhash.h   |   4 +-
> > > > >  src/util/virhashcode.c   | 125 
> > > > >  src/util/virhashcode.h   |  33 
> > > > 
> > > > Our hash code impl uses Murmurhash which makes some efforts to be
> > > > robust against malicious inputs triggering collisons, notably using
> > > > a random seed.
> > > > 
> > > > The new code uses  g_str_hash which is much weaker, and the API
> > > > docs explicitly recommend against using it if the input can be from
> > > > an untrusted user.
> > > 
> > > Yes, I've noticed that, but didn't consider it to be that much of a
> > > problem as any untrusted input which is stored in a hash table (so that
> > > the attacker can use crafted keys) must be in the first place
> > > safeguarded against OOM condition by limiting the input count/size.
> > 
> > The problem isn't OOM, rather it is algorithmic complexity. With malicious
> > hash collisions the runtime lookup performance degrades to O(n) which can
> > cause scalability concerns in some cases.
> 
> I was pointing out that limiting the input size needed for OOM limit
> conveniently limits the size of 'n'.
> 
> The worst case for a malicious actor that I can see is the block device
> statistics code, where the worst case input would be based on 2 * 10 MiB
> of json, where based on 200 bytes per entry you could achieve 100k hash
> comparisons.
> 
> As noted though, I think we can use the better hash function we have.
> 
> The only difference will be probably that the seed will be global and
> not per-table since glibs table doesn't support that. If that's not
> acceptable we need to keep all the code since glibs hash table's hash
> function prototype is:

A one-time seed is likely fine.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-27 Thread Peter Krempa
On Tue, Oct 27, 2020 at 10:04:33 +, Daniel Berrange wrote:
> On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> > On Mon, Oct 26, 2020 at 16:08:34 +, Daniel Berrange wrote:
> > > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > > Glib's hash table provides basically the same functionality as our hash
> > > > table.
> > > > 
> > > > In most cases the only thing that remains in the virHash* wrappers is
> > > > NULL-checks of '@table' argument as glib's hash functions don't tolerate
> > > > NULL.
> > > > 
> > > > In case of iterators, we adapt the existing API of iterators to glibs to
> > > > prevent having rewrite all callers at this point.
> > > > 
> > > > Signed-off-by: Peter Krempa 
> > > > ---
> > > >  src/libvirt_private.syms |   4 -
> > > >  src/util/meson.build |   1 -
> > > >  src/util/virhash.c   | 416 ++-
> > > >  src/util/virhash.h   |   4 +-
> > > >  src/util/virhashcode.c   | 125 
> > > >  src/util/virhashcode.h   |  33 
> > > 
> > > Our hash code impl uses Murmurhash which makes some efforts to be
> > > robust against malicious inputs triggering collisons, notably using
> > > a random seed.
> > > 
> > > The new code uses  g_str_hash which is much weaker, and the API
> > > docs explicitly recommend against using it if the input can be from
> > > an untrusted user.
> > 
> > Yes, I've noticed that, but didn't consider it to be that much of a
> > problem as any untrusted input which is stored in a hash table (so that
> > the attacker can use crafted keys) must be in the first place
> > safeguarded against OOM condition by limiting the input count/size.
> 
> The problem isn't OOM, rather it is algorithmic complexity. With malicious
> hash collisions the runtime lookup performance degrades to O(n) which can
> cause scalability concerns in some cases.

I was pointing out that limiting the input size needed for OOM limit
conveniently limits the size of 'n'.

The worst case for a malicious actor that I can see is the block device
statistics code, where the worst case input would be based on 2 * 10 MiB
of json, where based on 200 bytes per entry you could achieve 100k hash
comparisons.

As noted though, I think we can use the better hash function we have.

The only difference will be probably that the seed will be global and
not per-table since glibs table doesn't support that. If that's not
acceptable we need to keep all the code since glibs hash table's hash
function prototype is:

guint
(*GHashFunc) (gconstpointer key);



Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-27 Thread Daniel P . Berrangé
On Tue, Oct 27, 2020 at 10:53:12AM +0100, Peter Krempa wrote:
> On Mon, Oct 26, 2020 at 16:08:34 +, Daniel Berrange wrote:
> > On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > > Glib's hash table provides basically the same functionality as our hash
> > > table.
> > > 
> > > In most cases the only thing that remains in the virHash* wrappers is
> > > NULL-checks of '@table' argument as glib's hash functions don't tolerate
> > > NULL.
> > > 
> > > In case of iterators, we adapt the existing API of iterators to glibs to
> > > prevent having rewrite all callers at this point.
> > > 
> > > Signed-off-by: Peter Krempa 
> > > ---
> > >  src/libvirt_private.syms |   4 -
> > >  src/util/meson.build |   1 -
> > >  src/util/virhash.c   | 416 ++-
> > >  src/util/virhash.h   |   4 +-
> > >  src/util/virhashcode.c   | 125 
> > >  src/util/virhashcode.h   |  33 
> > 
> > Our hash code impl uses Murmurhash which makes some efforts to be
> > robust against malicious inputs triggering collisons, notably using
> > a random seed.
> > 
> > The new code uses  g_str_hash which is much weaker, and the API
> > docs explicitly recommend against using it if the input can be from
> > an untrusted user.
> 
> Yes, I've noticed that, but didn't consider it to be that much of a
> problem as any untrusted input which is stored in a hash table (so that
> the attacker can use crafted keys) must be in the first place
> safeguarded against OOM condition by limiting the input count/size.

The problem isn't OOM, rather it is algorithmic complexity. With malicious
hash collisions the runtime lookup performance degrades to O(n) which can
cause scalability concerns in some cases.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-27 Thread Peter Krempa
On Mon, Oct 26, 2020 at 16:08:34 +, Daniel Berrange wrote:
> On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> > Glib's hash table provides basically the same functionality as our hash
> > table.
> > 
> > In most cases the only thing that remains in the virHash* wrappers is
> > NULL-checks of '@table' argument as glib's hash functions don't tolerate
> > NULL.
> > 
> > In case of iterators, we adapt the existing API of iterators to glibs to
> > prevent having rewrite all callers at this point.
> > 
> > Signed-off-by: Peter Krempa 
> > ---
> >  src/libvirt_private.syms |   4 -
> >  src/util/meson.build |   1 -
> >  src/util/virhash.c   | 416 ++-
> >  src/util/virhash.h   |   4 +-
> >  src/util/virhashcode.c   | 125 
> >  src/util/virhashcode.h   |  33 
> 
> Our hash code impl uses Murmurhash which makes some efforts to be
> robust against malicious inputs triggering collisons, notably using
> a random seed.
> 
> The new code uses  g_str_hash which is much weaker, and the API
> docs explicitly recommend against using it if the input can be from
> an untrusted user.

Yes, I've noticed that, but didn't consider it to be that much of a
problem as any untrusted input which is stored in a hash table (so that
the attacker can use crafted keys) must be in the first place
safeguarded against OOM condition by limiting the input count/size.

Apart from insertion we do have at least one place where lookup is based
on untrusted data so there is a possibility for bigger scope of impact.

Said that I don't really believe that the complexity vs impact ratio
makes it worth even for a completely static hashing function, but when
we can do better, we should.

> IOW, I don't think we should be removing our virhashcode impl. If
> anything we should upgrade our hash code impl to use SipHash which
> is used by perl, python, ruby, rust and more.

Okay, I'll leave virhashcode in. Regardless whether we have a "cool"
hashing function or not, devs need to be vigilant of use of any data
structure based on untrusted data though.



Re: [PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-26 Thread Daniel P . Berrangé
On Mon, Oct 26, 2020 at 04:45:50PM +0100, Peter Krempa wrote:
> Glib's hash table provides basically the same functionality as our hash
> table.
> 
> In most cases the only thing that remains in the virHash* wrappers is
> NULL-checks of '@table' argument as glib's hash functions don't tolerate
> NULL.
> 
> In case of iterators, we adapt the existing API of iterators to glibs to
> prevent having rewrite all callers at this point.
> 
> Signed-off-by: Peter Krempa 
> ---
>  src/libvirt_private.syms |   4 -
>  src/util/meson.build |   1 -
>  src/util/virhash.c   | 416 ++-
>  src/util/virhash.h   |   4 +-
>  src/util/virhashcode.c   | 125 
>  src/util/virhashcode.h   |  33 

Our hash code impl uses Murmurhash which makes some efforts to be
robust against malicious inputs triggering collisons, notably using
a random seed.

The new code uses  g_str_hash which is much weaker, and the API
docs explicitly recommend against using it if the input can be from
an untrusted user.

IOW, I don't think we should be removing our virhashcode impl. If
anything we should upgrade our hash code impl to use SipHash which
is used by perl, python, ruby, rust and more.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[PATCH 10/13] util: hash: Reimplement virHashTable using GHashTable

2020-10-26 Thread Peter Krempa
Glib's hash table provides basically the same functionality as our hash
table.

In most cases the only thing that remains in the virHash* wrappers is
NULL-checks of '@table' argument as glib's hash functions don't tolerate
NULL.

In case of iterators, we adapt the existing API of iterators to glibs to
prevent having rewrite all callers at this point.

Signed-off-by: Peter Krempa 
---
 src/libvirt_private.syms |   4 -
 src/util/meson.build |   1 -
 src/util/virhash.c   | 416 ++-
 src/util/virhash.h   |   4 +-
 src/util/virhashcode.c   | 125 
 src/util/virhashcode.h   |  33 
 6 files changed, 107 insertions(+), 476 deletions(-)
 delete mode 100644 src/util/virhashcode.c
 delete mode 100644 src/util/virhashcode.h

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index f7b0d11ca2..9a5269fd0b 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -2219,10 +2219,6 @@ virHashSteal;
 virHashUpdateEntry;


-# util/virhashcode.h
-virHashCodeGen;
-
-
 # util/virhook.h
 virHookCall;
 virHookInitialize;
diff --git a/src/util/meson.build b/src/util/meson.build
index 3b711b1dc2..99f72cb04a 100644
--- a/src/util/meson.build
+++ b/src/util/meson.build
@@ -36,7 +36,6 @@ util_sources = [
   'virgettext.c',
   'virgic.c',
   'virhash.c',
-  'virhashcode.c',
   'virhook.c',
   'virhostcpu.c',
   'virhostmem.c',
diff --git a/src/util/virhash.c b/src/util/virhash.c
index a5ab48dbf5..ed658714e8 100644
--- a/src/util/virhash.c
+++ b/src/util/virhash.c
@@ -21,42 +21,13 @@

 #include "virerror.h"
 #include "virhash.h"
-#include "viralloc.h"
 #include "virlog.h"
-#include "virhashcode.h"
-#include "virrandom.h"
-#include "virstring.h"
 #include "virobject.h"

 #define VIR_FROM_THIS VIR_FROM_NONE

 VIR_LOG_INIT("util.hash");

-#define MAX_HASH_LEN 8
-
-/* #define DEBUG_GROW */
-
-/*
- * A single entry in the hash table
- */
-typedef struct _virHashEntry virHashEntry;
-typedef virHashEntry *virHashEntryPtr;
-struct _virHashEntry {
-struct _virHashEntry *next;
-char *name;
-void *payload;
-};
-
-/*
- * The entire hash table
- */
-struct _virHashTable {
-virHashEntryPtr *table;
-uint32_t seed;
-size_t size;
-size_t nbElems;
-virHashDataFree dataFree;
-};

 struct _virHashAtomic {
 virObjectLockable parent;
@@ -76,13 +47,6 @@ static int virHashAtomicOnceInit(void)

 VIR_ONCE_GLOBAL_INIT(virHashAtomic);

-static size_t
-virHashComputeKey(const virHashTable *table, const char *name)
-{
-uint32_t value = virHashCodeGen(name, strlen(name), table->seed);
-return value % table->size;
-}
-

 /**
  * virHashNew:
@@ -95,18 +59,7 @@ virHashComputeKey(const virHashTable *table, const char 
*name)
 virHashTablePtr
 virHashNew(virHashDataFree dataFree)
 {
-virHashTablePtr table = NULL;
-
-table = g_new0(virHashTable, 1);
-
-table->seed = virRandomBits(32);
-table->size = 32;
-table->nbElems = 0;
-table->dataFree = dataFree;
-
-table->table = g_new0(virHashEntryPtr, table->size);
-
-return table;
+return g_hash_table_new_full(g_str_hash, g_str_equal, g_free, dataFree);
 }


@@ -138,66 +91,6 @@ virHashAtomicDispose(void *obj)
 }


-/**
- * virHashGrow:
- * @table: the hash table
- * @size: the new size of the hash table
- *
- * resize the hash table
- *
- * Returns 0 in case of success, -1 in case of failure
- */
-static int
-virHashGrow(virHashTablePtr table, size_t size)
-{
-size_t oldsize, i;
-virHashEntryPtr *oldtable;
-
-#ifdef DEBUG_GROW
-size_t nbElem = 0;
-#endif
-
-if (table == NULL)
-return -1;
-if (size < 8)
-return -1;
-if (size > 8 * 2048)
-return -1;
-
-oldsize = table->size;
-oldtable = table->table;
-if (oldtable == NULL)
-return -1;
-
-table->table = g_new0(virHashEntryPtr, size);
-table->size = size;
-
-for (i = 0; i < oldsize; i++) {
-virHashEntryPtr iter = oldtable[i];
-while (iter) {
-virHashEntryPtr next = iter->next;
-size_t key = virHashComputeKey(table, iter->name);
-
-iter->next = table->table[key];
-table->table[key] = iter;
-
-#ifdef DEBUG_GROW
-nbElem++;
-#endif
-iter = next;
-}
-}
-
-VIR_FREE(oldtable);
-
-#ifdef DEBUG_GROW
-VIR_DEBUG("virHashGrow : from %d to %d, %ld elems", oldsize,
-  size, nbElem);
-#endif
-
-return 0;
-}
-
 /**
  * virHashFree:
  * @table: the hash table
@@ -208,76 +101,12 @@ virHashGrow(virHashTablePtr table, size_t size)
 void
 virHashFree(virHashTablePtr table)
 {
-size_t i;
-
 if (table == NULL)
 return;

-for (i = 0; i < table->size; i++) {
-virHashEntryPtr iter = table->table[i];
-while (iter) {
-virHashEntryPtr next = iter->next;
-
-if (table->dataFree)
-table->dataFree(iter->payload);
-g_free(iter->name);
-VIR_FREE(iter);