Re: Fwd: Re: Hash collision vectors in APR?

2012-02-23 Thread William A. Rowe Jr.
And responding;

On 2/23/2012 8:27 PM, William A. Rowe Jr. wrote:
> Forwarded for Kurt, as he's not subscribed.
> 
>  Original Message ----
> Subject: Re: Hash collision vectors in APR?
> Date: Thu, 23 Feb 2012 18:32:30 -0700
> From: Kurt Seifried 
> To: William A. Rowe Jr. 
> CC: APR Developer List , Stefan Fritsch 
> ,
> "Steven M. Christey" 
> 
> Apologies for the delayed reply.
> 
> CC'ing Steve since I can't actually remove or edit CVE's, just assign
> them.
> 
> On 02/22/2012 09:49 AM, William A. Rowe Jr. wrote:
>> After extensive consultation with the security projects of various
>> APR consumers, it's apparent that there are no actual
>> vulnerabilities to be exploited here.  Contrary to Mr Seifreid's
>> confusion, the recent code changes reflect a possibility of
>> mitigating potential hash collisions, but certainly do not and can
>> not eliminate such risks, and it is up to the developer to select
>> appropriate storage and lookup mechansims for their specific
>> problem domain.
> 
>> These changes do not represent either a security DEFECT nor any
>> actual security FIX.  The APR Project dis-acknowledges the
>> assignment of CVE-2012-0840 as erroneous, and invalid.  Kurt, since
>> you created the defect, please edit it appropriately.
>> secur...@apache.org is always happy to consult in order to avoid
>> future errors and misinformation.
> 
>> Stefan, please revert your miscommit.  In the future, please run
>> such things past secur...@apache.org before applying inaccurate
>> external assignments.
> 
> So as I now understand it APR doesn't expose itself in an exploitable
> manner, ergo the hash function, while technically vulnerable, cannot
> really be exploited in any current way, is this correct? Does this
> also take into account future changes/uses that may expose it
> potentially (or is that just not possible)? If you can confirm this
> then I guess the ball is in Steve's court (I'm also not sure how the
> CVE rules/etc. apply in this specific instance).

What is it 'vulnerable' to?  In the c language, I can code 'while (1) ;'
but that doesn't represent a vulnerability in the c language, simply
stupid programming.

The enhanced hash behavior seeks to avoid predictable hash collisions,
but it won't substitute for the intelligent application of apr_hash_t
objects.  It is similar to earlier language enhancements based on the
hash research published back in the early 2000's, but was in reaction
to other languages revisiting this computational issue, NOT to any
reported vulnerability.

This does not in and of itself fix any known SECURITY defect.  The
previous implementation was neither vulnerable nor invulnerable.  Simply,
there is no exploitable code which accepted an unlimited number of new
hash entries submitted by an untrusted entity into an apr_hash_t, as
happened in various cgi form processing applications written in some
other languages.

CVE's aren't used to track all possible future changes/uses/applications
of computing systems, that would be a rather absurd goal.  The Securina
report http://secunia.com/advisories/47862 in particular demonstrates the
disservice that Mitre agents can cause when creating CVE's without research.

CC'ing Mark who might be able to help untangle the mess introduced here.


Fwd: Re: Hash collision vectors in APR?

2012-02-23 Thread William A. Rowe Jr.
Forwarded for Kurt, as he's not subscribed.

 Original Message 
Subject: Re: Hash collision vectors in APR?
Date: Thu, 23 Feb 2012 18:32:30 -0700
From: Kurt Seifried 
To: William A. Rowe Jr. 
CC: APR Developer List , Stefan Fritsch ,
"Steven M. Christey" 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Apologies for the delayed reply.

CC'ing Steve since I can't actually remove or edit CVE's, just assign
them.

On 02/22/2012 09:49 AM, William A. Rowe Jr. wrote:
> After extensive consultation with the security projects of various
> APR consumers, it's apparent that there are no actual
> vulnerabilities to be exploited here.  Contrary to Mr Seifreid's
> confusion, the recent code changes reflect a possibility of
> mitigating potential hash collisions, but certainly do not and can
> not eliminate such risks, and it is up to the developer to select
> appropriate storage and lookup mechansims for their specific
> problem domain.
> 
> These changes do not represent either a security DEFECT nor any
> actual security FIX.  The APR Project dis-acknowledges the
> assignment of CVE-2012-0840 as erroneous, and invalid.  Kurt, since
> you created the defect, please edit it appropriately.
> secur...@apache.org is always happy to consult in order to avoid
> future errors and misinformation.
> 
> Stefan, please revert your miscommit.  In the future, please run
> such things past secur...@apache.org before applying inaccurate
> external assignments.

So as I now understand it APR doesn't expose itself in an exploitable
manner, ergo the hash function, while technically vulnerable, cannot
really be exploited in any current way, is this correct? Does this
also take into account future changes/uses that may expose it
potentially (or is that just not possible)? If you can confirm this
then I guess the ball is in Steve's court (I'm also not sure how the
CVE rules/etc. apply in this specific instance).

- -- 
Kurt Seifried Red Hat Security Response Team (SRT)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPRuiqAAoJEBYNRVNeJnmTAgIQAMX99REZXiwSh1mFX4foYjss
eBy2kz1ijFtFV2rH36F2eqeEPWFo82MrLDjGDH6j1zxrKiaUnziJ8pbKm/zcpF/G
ub3FS+8lzKQ3fxI/gJAPSRCoMZwx98PI8u6YEV96WU0m6UTxNQTGcRphUhVLMRze
gFLg6aOTW7kSoWZSbexQ9M8MmXR0E0HWQL2uzxLnZUi+s5AYbIV7WntEw9iQEfNr
uGromu2UYkjZVzjU2o54K/3bWUUmuAHBzVpYj91pGvRZPHDclBOBiQgtg3TNceRZ
LQ6vU2NWuciqr/0Hz41MHB70zp/zIHLr1xbuFgXX7lOVIVGEszY79BXS5iXNpvRi
MZvygEmfWmSQ/hxJPN6yjD0aXUd7+WpnUzVbOZlAX3sB3N02TrfLzDXnugQd06Lu
G8/OPq+Gi5UISDjST/DX63ke+tV5q7lk2oswuhq8ITeQ+4fuDggEklO4jgdqCH7C
XBxe/Z6023mPqei3Ot0Bx/IrbJlXLyWMnbE2efjjb+mAg6yC8tjp93QBRJNkQ/4n
2YvTvIHS+frVkCWfKl4Y7NFwADb6EKWXfdMPd43kufDcogtbWok19/rZ+JercM4K
oCk7VnIZ2qwHUUU7KCBtXUxA+rW5qXoqgbXOzmrTMkmBfhYY7UJ+kiYEVobkRJO3
SGUF5xhBbZnkD4bQSTSi
=0OpC
-END PGP SIGNATURE-


RE: Hash collision vectors in APR?

2012-02-22 Thread Luke Meyer
Funny how things escalate. Looks like someone turned this:

> Should we add some randomization to prevent abuse?

Into this:

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-0840
http://secunia.com/advisories/47862
"The vulnerability is caused due to an error within a hash generation function 
when hashing form posts and updating a hash table. This can be exploited to 
cause a hash collision resulting in high CPU consumption via a specially 
crafted form sent in a HTTP POST request."

Reeeallly?? I guess I missed the part where any actual error or exploit was 
found...


Re: Hash collision vectors in APR?

2012-02-22 Thread William A. Rowe Jr.
On 1/5/2012 11:45 AM, William A. Rowe Jr. wrote:
> http://www.nruns.com/_downloads/advisory28122011.pdf
> 
> Should we add some randomization to prevent abuse?
> 
> It's hard to anticipate how folks might leverage apr, and how malicious
> folks might then seek to exploit computational workload vectors.

After extensive consultation with the security projects of various APR
consumers, it's apparent that there are no actual vulnerabilities to be
exploited here.  Contrary to Mr Seifreid's confusion, the recent code
changes reflect a possibility of mitigating potential hash collisions,
but certainly do not and can not eliminate such risks, and it is up to
the developer to select appropriate storage and lookup mechansims for
their specific problem domain.

These changes do not represent either a security DEFECT nor any actual
security FIX.  The APR Project dis-acknowledges the assignment of
CVE-2012-0840 as erroneous, and invalid.  Kurt, since you created the
defect, please edit it appropriately.  secur...@apache.org is always
happy to consult in order to avoid future errors and misinformation.

Stefan, please revert your miscommit.  In the future, please run such
things past secur...@apache.org before applying inaccurate external
assignments.
--- Begin Message ---
Author: sf
Date: Mon Feb 13 18:51:43 2012
New Revision: 1243653

URL: http://svn.apache.org/viewvc?rev=1243653&view=rev
Log:
Add reference to CVE-2012-0840

Modified:
apr/apr/branches/1.4.x/CHANGES

Modified: apr/apr/branches/1.4.x/CHANGES
URL: 
http://svn.apache.org/viewvc/apr/apr/branches/1.4.x/CHANGES?rev=1243653&r1=1243652&r2=1243653&view=diff
==
--- apr/apr/branches/1.4.x/CHANGES [utf-8] (original)
+++ apr/apr/branches/1.4.x/CHANGES [utf-8] Mon Feb 13 18:51:43 2012
@@ -8,7 +8,7 @@ Changes for APR 1.4.6
   *) Flush write buffer before truncate call on a file.
  [Mladen Turk]
 
-  *) Security: oCERT-2011-003
+  *) Security: CVE-2012-0840, oCERT-2011-003
  Randomise hashes by providing a seed. 
  [Bojan Smojver, Branko Čibej, Ruediger Pluem et al.]
 



--- End Message ---


Re: Hash collision vectors in APR?

2012-01-27 Thread Bojan Smojver

--- Original message ---

From: Branko Čibej



But the original code is already wrong in assuming that res->hash_func
and overlay->hash_func are the same, so in fact you're silently fixing a
bug with the current change. :)


Surprisingly enough, I actually thought about this bit. Surely two hashes 
tables with different hash functions would have different hash values.

But, we can still use your optimisation with reverse xor (which I did not think 
of at all). We can check whether both input hashes have the same hash function 
pointer and use your optimisation in that case.

Once again, thanks for your input!

--
Bojan

Re: Hash collision vectors in APR?

2012-01-27 Thread Branko Čibej
On 26.01.2012 22:59, Bojan Smojver wrote:
> On Fri, 2012-01-27 at 00:36 +1100, Bojan Smojver wrote:
>> Thanks, will implement.
> Attached. If nobody comes up with regressions or other problems with
> this approach, I'll commit it to trunk.
>
> Thanks everyone for their input.
>

The only thing that pops up is that this part:
> @@ -468,7 +476,8 @@
> for (k = 0; k <= overlay->max; k++) {
> for (iter = overlay->array[k]; iter; iter = iter->next) {
> - i = iter->hash & res->max;
> + hash = res->seed ^ res->hash_func(iter->key, &iter->klen);
> + i = hash & res->max;
> for (ent = res->array[i]; ent; ent = ent->next) {
> if ((ent->klen == iter->klen) &&
> (memcmp(ent->key, iter->key, iter->klen) == 0)) {

is obviously a lot slower than it used to be, because hash_func is not
trivial. The following would give the same result:

hash = iter->hash ^ overlay->seed ^ res->seed

Since XOR is symmetric, the "iter->hash ^ overlay->seed" part
de-randomizes the iterator seed.

But the original code is already wrong in assuming that res->hash_func
and overlay->hash_func are the same, so in fact you're silently fixing a
bug with the current change. :)

-- Brane



Re: Hash collision vectors in APR?

2012-01-26 Thread Bojan Smojver
On Fri, 2012-01-27 at 00:36 +1100, Bojan Smojver wrote:
> Thanks, will implement.

Attached. If nobody comes up with regressions or other problems with
this approach, I'll commit it to trunk.

Thanks everyone for their input.

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1235978)
+++ tables/apr_hash.c	(working copy)
@@ -18,6 +18,7 @@
 
 #include "apr_general.h"
 #include "apr_pools.h"
+#include "apr_time.h"
 
 #include "apr_hash.h"
 
@@ -75,7 +76,7 @@
 apr_pool_t  *pool;
 apr_hash_entry_t   **array;
 apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
 apr_hashfunc_t   hash_func;
 apr_hash_entry_t*free;  /* List of recycled entries */
 };
@@ -95,13 +96,18 @@
 APR_DECLARE(apr_hash_t *) apr_hash_make(apr_pool_t *pool)
 {
 apr_hash_t *ht;
+apr_time_t now = apr_time_now();
+
 ht = apr_palloc(pool, sizeof(apr_hash_t));
 ht->pool = pool;
 ht->free = NULL;
 ht->count = 0;
 ht->max = INITIAL_MAX;
+ht->seed = (unsigned int)((now >> 32) ^ now ^ (apr_uintptr_t)pool ^
+  (apr_uintptr_t)ht ^ (apr_uintptr_t)&now) - 1;
 ht->array = alloc_array(ht, ht->max);
 ht->hash_func = apr_hashfunc_default;
+
 return ht;
 }
 
@@ -280,7 +286,7 @@
 apr_hash_entry_t **hep, *he;
 unsigned int hash;
 
-hash = ht->hash_func(key, &klen);
+hash = ht->seed ^ ht->hash_func(key, &klen);
 
 /* scan linked list */
 for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +328,7 @@
 ht->free = NULL;
 ht->count = orig->count;
 ht->max = orig->max;
+ht->seed = orig->seed;
 ht->hash_func = orig->hash_func;
 ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));
 
@@ -419,7 +426,7 @@
 apr_hash_entry_t *new_vals = NULL;
 apr_hash_entry_t *iter;
 apr_hash_entry_t *ent;
-unsigned int i,j,k;
+unsigned int i, j, k, hash;
 
 #if APR_POOL_DEBUG
 /* we don't copy keys and values, so it's necessary that
@@ -447,6 +454,7 @@
 if (base->count + overlay->count > res->max) {
 res->max = res->max * 2 + 1;
 }
+res->seed = base->seed;
 res->array = alloc_array(res, res->max);
 if (base->count + overlay->count) {
 new_vals = apr_palloc(p, sizeof(apr_hash_entry_t) *
@@ -468,7 +476,8 @@
 
 for (k = 0; k <= overlay->max; k++) {
 for (iter = overlay->array[k]; iter; iter = iter->next) {
-i = iter->hash & res->max;
+hash = res->seed ^ res->hash_func(iter->key, &iter->klen);
+i = hash & res->max;
 for (ent = res->array[i]; ent; ent = ent->next) {
 if ((ent->klen == iter->klen) &&
 (memcmp(ent->key, iter->key, iter->klen) == 0)) {
@@ -486,7 +495,7 @@
 new_vals[j].klen = iter->klen;
 new_vals[j].key = iter->key;
 new_vals[j].val = iter->val;
-new_vals[j].hash = iter->hash;
+new_vals[j].hash = hash;
 new_vals[j].next = res->array[i];
 res->array[i] = &new_vals[j];
 res->count++;
Index: test/testhash.c
===
--- test/testhash.c	(revision 1235978)
+++ test/testhash.c	(working copy)
@@ -438,6 +438,79 @@
 ABTS_STR_EQUAL(tc, "#entries 5\n", StrArray[5]);
 }
 
+static void overlay_fetch(abts_case *tc, void *data)
+{
+apr_hash_t *base = NULL;
+apr_hash_t *overlay = NULL;
+apr_hash_t *result = NULL;
+int count;
+
+base = apr_hash_make(p);
+overlay = apr_hash_make(p);
+ABTS_PTR_NOTNULL(tc, base);
+ABTS_PTR_NOTNULL(tc, overlay);
+
+apr_hash_set(base, "base1", APR_HASH_KEY_STRING, "value1");
+apr_hash_set(base, "base2", APR_HASH_KEY_STRING, "value2");
+apr_hash_set(base, "base3", APR_HASH_KEY_STRING, "value3");
+apr_hash_set(base, "base4", APR_HASH_KEY_STRING, "value4");
+apr_hash_set(base, "base5", APR_HASH_KEY_STRING, "value5");
+
+apr_hash_set(overlay, "overlay1", APR_HASH_KEY_STRING, "value1");
+apr_hash_set(overlay, "overlay2", APR_HASH_KEY_STRING, "value2");
+apr_hash_set(overlay, "overlay3", APR_HASH_KEY_STRING, "value3");
+apr_hash_set(overlay, "overlay4", APR_HASH_KEY_STRING, "value4");
+apr_hash_set(overlay, "overlay5", APR_HASH_KEY_STRING, "value5");
+
+result = apr_hash_overlay(p, overlay, base);
+
+count = apr_hash_count(result);
+ABTS_INT_EQUAL(tc, 10, count);
+
+ABTS_STR_EQUAL(tc, "value1",
+   apr_hash_get(result, "base1", APR_HASH_KEY_STRING));
+ABTS_STR_EQUAL(tc, "value2",
+   apr_hash_get(result, "base2", APR_HASH_KEY_STRING));
+ABTS_STR_EQUAL(tc, "value3",
+   apr_hash_get(result, "base3", APR_HASH_KEY_STRING));
+A

Re: Hash collision vectors in APR?

2012-01-26 Thread Bojan Smojver

--- Original message ---

From: Branko Čibej



Correct. I intentionally left out the implementation, since i don't
really have an opinion about how it should be done. I just wanted to
point out that there's a simple way to add per-table randomization
without changing any of the existing APIs, or adding new ones.


Thanks, will implement. We don't even need a function for this. A simple xor 
against the seed will do.

--
Bojan

Re: Hash collision vectors in APR?

2012-01-26 Thread Branko Čibej
On 26.01.2012 12:15, Bojan Smojver wrote:
> --- Original message ---
>> From: Branko Čibej
>
>> - hash = ht->hash_func(key, &klen);
>> + hash = randomize_hash(ht, ht->hash_func(key, &klen);
>>
>> Completely private change that leaves hash_func_t unchanged.
>
> Interesting approach. So, randomize_hash() would then perturb (xor or
> something) what the hash function returns based on the seed, right?

Correct. I intentionally left out the implementation, since i don't
really have an opinion about how it should be done. I just wanted to
point out that there's a simple way to add per-table randomization
without changing any of the existing APIs, or adding new ones.

-- Brane



Re: Hash collision vectors in APR?

2012-01-26 Thread Bojan Smojver

--- Original message ---

From: Branko Čibej



- hash = ht->hash_func(key, &klen);
+ hash = randomize_hash(ht, ht->hash_func(key, &klen);

Completely private change that leaves hash_func_t unchanged.


Interesting approach. So, randomize_hash() would then perturb (xor or 
something) what the hash function returns based on the seed, right?

--
Bojan

Re: Hash collision vectors in APR?

2012-01-26 Thread Branko Čibej
On 06.01.2012 00:05, Bojan Smojver wrote:
> On Fri, 2012-01-06 at 09:48 +1100, Bojan Smojver wrote:
>> The apr_hashfunc_t function prototype would then most likely have to
>> change. We'd probably need to pass the hash itself into it, which
>> would then hold the per-hash seed. Right?
> Actually, that would not be a good plan. A custom hash function would
> not be able to see inside the hash structure, because it's opaque. Most
> likely, the seed itself would have to be passed around as the third
> argument. Any other ideas?

- hash = ht->hash_func(key, &klen);
+ hash = randomize_hash(ht, ht->hash_func(key, &klen);

Completely private change that leaves hash_func_t unchanged.

-- Brane


Re: Hash collision vectors in APR?

2012-01-17 Thread Bojan Smojver

--- Original message ---

From: Joe Orton



   (hash(key) * ht->pure_random_number) % ht->max

where ht->max is 15 by default.  So you "merely" have to increase the
size of the input by 15 to produce at least the same overhead; the
attacker must generate 15n keys to ensure they hit all the buckets.  The
attack is still quadratic time.


At hash size of 15 we are probably not going to hit computational limits. 
However, if the hash is say 511 in size (because it will grow and is always 
the size of a power of two minus one), the attacker will then (according to 
the above maths) have to provide 500 times more input to hit the problem. 
That may be more difficult. Right?


--
Bojan 


Re: Hash collision vectors in APR?

2012-01-16 Thread Joe Orton
On Thu, Jan 05, 2012 at 01:35:50PM -0500, Jeff Trawick wrote:
> We don't want to say "go fish" to APR applications if our hash plus
> their vector results in an exploit.

I'll bite.

1) Chained hash tables have quadratic worst case insertion time, 
O(n**2).  The APR hash table (apr_hash_*) implementation shares that 
characteristic.

2) APR tables (apr_table_*) provide a data structure with constant time 
worst case insertion, O(1).

3) If applications are using apr_hash_* where they should be doing 
apr_table_*, the apps should be fixed to use the correct API.

4) Mixing some per-hash randomness into the hash is mostly futile.  The 
attack is effective if you can generate "n" keys which hit the same hash 
bucket.  The hash bucket chosen is (hash(key) % ht->max).

The proposal is to change the hash lookup function to, say: 

   (hash(key) * ht->pure_random_number) % ht->max

where ht->max is 15 by default.  So you "merely" have to increase the 
size of the input by 15 to produce at least the same overhead; the 
attacker must generate 15n keys to ensure they hit all the buckets.  The 
attack is still quadratic time.

Reducing the impact by a constant factor might make a difference to some 
app, but it's likely that app needs to switch to use apr_table_* or 
otherwise limiting untrusted input more tightly anyway.

OK, now flame me to death ;) Joe


Re: Hash collision vectors in APR?

2012-01-14 Thread Bojan Smojver
On Sat, 2012-01-14 at 14:50 +1100, Bojan Smojver wrote:
> Yeah, that's probably what we'll have to do.

Gone bold and committed that in r1231605. Feel free to improve,
criticise, rip out etc.

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-13 Thread Bojan Smojver

--- Original message ---

From: Chris Darroch



I'd personally rather see the use of it dropped entirely in all cases
and the code Bojan worked on as the fallback case always used, since
it adds no significant complexity and appears in line with other 
solutions,

such as the one in Perl.  And if I understand Ben Laurie's note, he was
indicating that truly random entropy (as from /dev/random) is probably
overkill here anyway.  Does that make sense?


Yeah, that's probably what we'll have to do. AFAICT, Perl generates random 
seed once per process and we'd do it once per hash, so that's better 
already.


--
Bojan 


Re: Hash collision vectors in APR?

2012-01-13 Thread Chris Darroch

Philip Martin wrote:


I'm not sure how Linux distributors configure things, but APR itself
defaults to /dev/urandom.


  Yup, Bojan corrected me on that too!


I don't think apr_hash_make should be trying to second guess
apr_generate_random_bytes.


  I suppose my strong hesitation about handing off to
apr_generate_random_bytes() is that there are ways it can block, and I
think not just with /dev/random but with other non-DEV_RANDOM cases also.

  Regardless, apr_hash_make() has never blocked before, so this would be
a change in behaviour even for those who are using apr_generate_random_bytes()
elsewhere in their code.  If it's documented, I can see that might be
fine for APR 2.0, or for a new apr_hash_make_random() function in 1.4.
But I'm a little nervous about causing such an apparently simple and
well-known function ("make a new hash") to block when that's never been
its prior behaviour.


  This whole topic stems from the problem of an abstraction (a hash
store) which normally works quickly to store and retrieve arbitrary user
input (e.g., CGI arguments) devolving into a slow linked-list operation
given pathological or malicious data.  That seems like a classic case
of a "leaky abstraction" to me -- something which usually works like
magic but which can unexpectedly not work well at all.[1]

  So, for me at least, addressing that problem by introducing a new
potential leaky abstraction -- an apparently simple call which always
worked quickly before, but which may now block indefinitely on certain
systems -- seems like heading in the wrong direction.

  If there are concerns about second-guessing apr_generate_random_bytes(),
I'd personally rather see the use of it dropped entirely in all cases
and the code Bojan worked on as the fallback case always used, since
it adds no significant complexity and appears in line with other solutions,
such as the one in Perl.  And if I understand Ben Laurie's note, he was
indicating that truly random entropy (as from /dev/random) is probably
overkill here anyway.  Does that make sense?

Chris.

1. http://www.joelonsoftware.com/articles/LeakyAbstractions.html

--
GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9


Re: Hash collision vectors in APR?

2012-01-13 Thread Chris Darroch

Bojan Smojver wrote:


On Fri, 2012-01-13 at 19:57 +1100, Bojan Smojver wrote:

This probably wouldn't work as expected. If you had /dev/random
selected for DEV_RANDOM, the if would fail and srand()/rand() bit
would never get called. No?


Like attached maybe?


  Yes, that looks better than mine, thanks!

Chris.

--
GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9



Re: Hash collision vectors in APR?

2012-01-13 Thread Philip Martin
Chris Darroch  writes:

>   The other question, I suppose, is whether there's a hit to performance
> from apr_generate_random_bytes() call.  Without having tested explicitly,
> it looks like the default case for modern Linux is APR_HAS_RANDOM=1 and
> DEV_RANDOM=/dev/random, with /dev/random blocking when there's no entropy.
> That might, I think, cause apr_hash_make() to block unexpectedly for
> callers.

I'm not sure how Linux distributors configure things, but APR itself
defaults to /dev/urandom.  From

http://svn.apache.org/repos/asf/apr/apr/trunk/configure.in

# prefer /dev/arandom, which does; see random(4).
for f in /dev/arandom /dev/urandom /dev/random; do

As far as I can tell on FreeBSD /dev/random doesn't block:

http://www.freebsd.org/cgi/man.cgi?query=random&apropos=0&sektion=4&manpath=FreeBSD+9.0-RELEASE&arch=default&format=html

I don't think apr_hash_make should be trying to second guess
apr_generate_random_bytes.

-- 
Philip


Re: Hash collision vectors in APR?

2012-01-13 Thread Bojan Smojver
On Fri, 2012-01-13 at 19:57 +1100, Bojan Smojver wrote:
> This probably wouldn't work as expected. If you had /dev/random
> selected for DEV_RANDOM, the if would fail and srand()/rand() bit
> would never get called. No?

Like attached maybe?

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1230868)
+++ tables/apr_hash.c	(working copy)
@@ -18,6 +18,10 @@
 
 #include "apr_general.h"
 #include "apr_pools.h"
+#include "apr_time.h"
+#if APR_HAVE_STDLIB_H
+#include  /* for rand, srand */
+#endif
 
 #include "apr_hash.h"
 
@@ -75,7 +79,7 @@
 apr_pool_t  *pool;
 apr_hash_entry_t   **array;
 apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
 apr_hashfunc_t   hash_func;
 apr_hash_entry_t*free;  /* List of recycled entries */
 };
@@ -95,13 +99,27 @@
 APR_DECLARE(apr_hash_t *) apr_hash_make(apr_pool_t *pool)
 {
 apr_hash_t *ht;
+apr_time_t now;
+
 ht = apr_palloc(pool, sizeof(apr_hash_t));
 ht->pool = pool;
 ht->free = NULL;
 ht->count = 0;
 ht->max = INITIAL_MAX;
+#if APR_HAS_RANDOM && defined(DEV_RANDOM)
+if (!strcmp(DEV_RANDOM, "/dev/random") ||
+apr_generate_random_bytes(
+(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS) {
+#endif
+now = apr_time_now();
+srand((unsigned int)((now >> 32) ^ now ^ (apr_uintptr_t)ht));
+ht->seed = (unsigned int)(rand());
+#if APR_HAS_RANDOM && defined(DEV_RANDOM)
+}
+#endif
 ht->array = alloc_array(ht, ht->max);
-ht->hash_func = apr_hashfunc_default;
+ht->hash_func = NULL;
+
 return ht;
 }
 
@@ -201,10 +219,10 @@
 ht->max = new_max;
 }
 
-APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
-  apr_ssize_t *klen)
+static unsigned int apr_hashfunc_default_internal(const char *char_key,
+  apr_ssize_t *klen,
+  unsigned int hash)
 {
-unsigned int hash = 0;
 const unsigned char *key = (const unsigned char *)char_key;
 const unsigned char *p;
 apr_ssize_t i;
@@ -246,7 +264,7 @@
  *
  *  -- Ralf S. Engelschall 
  */
- 
+
 if (*klen == APR_HASH_KEY_STRING) {
 for (p = key; *p; p++) {
 hash = hash * 33 + *p;
@@ -262,6 +280,11 @@
 return hash;
 }
 
+APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
+  apr_ssize_t *klen)
+{
+return apr_hashfunc_default_internal(char_key, klen, 0);
+}
 
 /*
  * This is where we keep the details of the hash function and control
@@ -280,7 +303,10 @@
 apr_hash_entry_t **hep, *he;
 unsigned int hash;
 
-hash = ht->hash_func(key, &klen);
+if (ht->hash_func)
+hash = ht->hash_func(key, &klen);
+else
+hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
 
 /* scan linked list */
 for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +348,7 @@
 ht->free = NULL;
 ht->count = orig->count;
 ht->max = orig->max;
+ht->seed = orig->seed;
 ht->hash_func = orig->hash_func;
 ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));
 


Re: Hash collision vectors in APR?

2012-01-13 Thread Bojan Smojver
On Thu, 2012-01-12 at 19:58 -0800, Chris Darroch wrote:
> +if (strcmp(DEV_RANDOM, "/dev/random") != 0 &&
> +apr_generate_random_bytes((unsigned char *)&ht->seed,
> +  sizeof(ht->seed)) != APR_SUCCESS) 

This probably wouldn't work as expected. If you had /dev/random selected
for DEV_RANDOM, the if would fail and srand()/rand() bit would never get
called. No?

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-12 Thread Bojan Smojver

--- Original message ---

From: Chris Darroch



Please test and improve!


Thanks Chris!

--
Bojan 


Re: Hash collision vectors in APR?

2012-01-12 Thread Chris Darroch

Bojan Smojver wrote:


Given we use CTR, maybe I can just commit something like this to trunk
and others can then commit improvements over the top, if desired?


  I'm an httpd guy but not a APR one, so it's really up to you!

  Here's a slight variant which, I believe, bypasses
apr_generate_random_bytes() unless you've got /dev/urandom or /dev/arandom.

  I think that might be useful because, looking at apr_generate_random_bytes()
in the cases where you don't have DEV_RANDOM set but do have things like
HAVE_EGD or HAVE_TRUERAND, it seems like most of these other options,
like /dev/random, are set to block, e.g., comments like:

   req[0] = 2; /* We'll block for now. */

   /* this will increase the startup time of the server, unfortunately...
* (generating 20 bytes takes about 8 seconds)

  Anyway, it seems like the best might just be to skip the thing
unless we're guaranteed a non-blocking action.

  I also tried to fix a compiler warning on the cast from the pointer ht
into an integer ... others should test on differently-sized systems,
but it was throwing "cast from pointer to integer of different size"
for an older 32-bit system with the cast to apr_time_t.  Please test
and improve!  Thanks,

Chris.

===
--- tables/apr_hash.c.orig  2012-01-10 11:47:17.0 -0800
+++ tables/apr_hash.c   2012-01-12 19:55:43.0 -0800
@@ -18,6 +18,10 @@

#include "apr_general.h"
#include "apr_pools.h"
+#include "apr_time.h"
+#if APR_HAVE_STDLIB_H
+#include  /* for rand, srand */
+#endif

#include "apr_hash.h"

@@ -75,7 +79,7 @@
apr_pool_t  *pool;
apr_hash_entry_t   **array;
apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
apr_hashfunc_t   hash_func;
apr_hash_entry_t*free;  /* List of recycled entries */
};
@@ -95,13 +99,28 @@
APR_DECLARE(apr_hash_t *) apr_hash_make(apr_pool_t *pool)
{
apr_hash_t *ht;
+apr_time_t now;
+
ht = apr_palloc(pool, sizeof(apr_hash_t));
ht->pool = pool;
ht->free = NULL;
ht->count = 0;
ht->max = INITIAL_MAX;
+#if APR_HAS_RANDOM
+#ifdef DEV_RANDOM
+if (strcmp(DEV_RANDOM, "/dev/random") != 0 &&
+apr_generate_random_bytes((unsigned char *)&ht->seed,
+  sizeof(ht->seed)) != APR_SUCCESS)
+#endif
+#endif
+{
+now = apr_time_now();
+srand((unsigned int)((now >> 32) ^ now ^ (apr_uintptr_t)ht));
+ht->seed = (unsigned int)(rand());
+}
ht->array = alloc_array(ht, ht->max);
-ht->hash_func = apr_hashfunc_default;
+ht->hash_func = NULL;
+
return ht;
}

@@ -201,10 +220,10 @@
ht->max = new_max;
}

-APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
-  apr_ssize_t *klen)
+static unsigned int apr_hashfunc_default_internal(const char *char_key,
+  apr_ssize_t *klen,
+  unsigned int hash)
{
-unsigned int hash = 0;
const unsigned char *key = (const unsigned char *)char_key;
const unsigned char *p;
apr_ssize_t i;
@@ -246,7 +265,7 @@
 *
 *  -- Ralf S. Engelschall 
 */
- 
+

if (*klen == APR_HASH_KEY_STRING) {
for (p = key; *p; p++) {
hash = hash * 33 + *p;
@@ -262,6 +281,11 @@
return hash;
}

+APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
+  apr_ssize_t *klen)
+{
+return apr_hashfunc_default_internal(char_key, klen, 0);
+}

/*
 * This is where we keep the details of the hash function and control
@@ -280,7 +304,10 @@
apr_hash_entry_t **hep, *he;
unsigned int hash;

-hash = ht->hash_func(key, &klen);
+if (ht->hash_func)
+hash = ht->hash_func(key, &klen);
+else
+hash = apr_hashfunc_default_internal(key, &klen, ht->seed);

/* scan linked list */
for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +349,7 @@
ht->free = NULL;
ht->count = orig->count;
ht->max = orig->max;
+ht->seed = orig->seed;
ht->hash_func = orig->hash_func;
ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));



Re: Hash collision vectors in APR?

2012-01-12 Thread Bojan Smojver
On Wed, 2012-01-11 at 17:50 +1100, Bojan Smojver wrote:
> We could also whack ht into the xor with time and use it as seed to 
> srand(). 

Attached.

Given we use CTR, maybe I can just commit something like this to trunk
and others can then commit improvements over the top, if desired?

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1230868)
+++ tables/apr_hash.c	(working copy)
@@ -18,6 +18,10 @@
 
 #include "apr_general.h"
 #include "apr_pools.h"
+#include "apr_time.h"
+#if APR_HAVE_STDLIB_H
+#include  /* for rand, srand */
+#endif
 
 #include "apr_hash.h"
 
@@ -75,7 +79,7 @@
 apr_pool_t  *pool;
 apr_hash_entry_t   **array;
 apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
 apr_hashfunc_t   hash_func;
 apr_hash_entry_t*free;  /* List of recycled entries */
 };
@@ -95,13 +99,26 @@
 APR_DECLARE(apr_hash_t *) apr_hash_make(apr_pool_t *pool)
 {
 apr_hash_t *ht;
+apr_time_t now;
+
 ht = apr_palloc(pool, sizeof(apr_hash_t));
 ht->pool = pool;
 ht->free = NULL;
 ht->count = 0;
 ht->max = INITIAL_MAX;
+#if APR_HAS_RANDOM
+if (apr_generate_random_bytes(
+(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS) {
+#endif
+now = apr_time_now();
+srand((unsigned int)((now >> 32) ^ now ^ (apr_time_t)ht));
+ht->seed = (unsigned int)(rand());
+#if APR_HAS_RANDOM
+}
+#endif
 ht->array = alloc_array(ht, ht->max);
-ht->hash_func = apr_hashfunc_default;
+ht->hash_func = NULL;
+
 return ht;
 }
 
@@ -201,10 +218,10 @@
 ht->max = new_max;
 }
 
-APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
-  apr_ssize_t *klen)
+static unsigned int apr_hashfunc_default_internal(const char *char_key,
+  apr_ssize_t *klen,
+  unsigned int hash)
 {
-unsigned int hash = 0;
 const unsigned char *key = (const unsigned char *)char_key;
 const unsigned char *p;
 apr_ssize_t i;
@@ -246,7 +263,7 @@
  *
  *  -- Ralf S. Engelschall 
  */
- 
+
 if (*klen == APR_HASH_KEY_STRING) {
 for (p = key; *p; p++) {
 hash = hash * 33 + *p;
@@ -262,6 +279,11 @@
 return hash;
 }
 
+APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
+  apr_ssize_t *klen)
+{
+return apr_hashfunc_default_internal(char_key, klen, 0);
+}
 
 /*
  * This is where we keep the details of the hash function and control
@@ -280,7 +302,10 @@
 apr_hash_entry_t **hep, *he;
 unsigned int hash;
 
-hash = ht->hash_func(key, &klen);
+if (ht->hash_func)
+hash = ht->hash_func(key, &klen);
+else
+hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
 
 /* scan linked list */
 for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +347,7 @@
 ht->free = NULL;
 ht->count = orig->count;
 ht->max = orig->max;
+ht->seed = orig->seed;
 ht->hash_func = orig->hash_func;
 ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));
 


Re: Hash collision vectors in APR?

2012-01-10 Thread William A. Rowe Jr.
On 1/11/2012 12:50 AM, Bojan Smojver wrote:
> --- Original message ---
>> From: William A. Rowe Jr.
> 
>> Or, rather, ht->seed = (unsigned int)ht - 1 to avoid the obviousness
>> of folding on even values.
> 
> We could also whack ht into the xor with time and use it as seed to srand(). 
> So many
> possibilities...
> 
> Anyhow, what of the tables - willl they be affected too?

The computational burdens of tables, lists, ordered lists, btrees and
the like are all well understood.

This exception stood out because of the unexpected computational intensity.




Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver

--- Original message ---

From: William A. Rowe Jr.



Or, rather, ht->seed = (unsigned int)ht - 1 to avoid the obviousness
of folding on even values.


We could also whack ht into the xor with time and use it as seed to 
srand(). So many possibilities...


Anyhow, what of the tables - willl they be affected too?

--
Bojan 


Re: Hash collision vectors in APR?

2012-01-10 Thread Igor Galić


- Original Message -
> Bojan:
> 
> > On Tue, 2012-01-10 at 15:29 -0800, Chris Darroch wrote:
> >> Without having tested explicitly, it looks like the default case
> >> for
> >> modern Linux is APR_HAS_RANDOM=1 and DEV_RANDOM=/dev/random,
> >> with /dev/random blocking when there's no entropy.
> > 
> > Don't think so (run on my F-16 machine, without passing any options
> > to
> > that effect):
> > ---
> > checking for entropy source... /dev/urandom
> > ---
> > 
> > If you look at the test, it has:
> > ---
> > for f in /dev/arandom /dev/urandom /dev/random; do
> > ---
> > 
> > So, non-blocking is preferred on Linux for sure.
> 
>That's good -- I guess the question is what happens if /dev/random
> is chosen, though, either automagically or through an explicit choice
> with --with-devrandom=/dev/random.
> 
>In the latter case, at least, I suppose it might be acceptable for
> apr_hash_make() blocks, since you picked /dev/random and presumably
> know what you're doing.

I've done that (accidentally) on Solaris and Linux, the result was
that for instance creating a new Subversion repository would take
up to 10 minutes.

> Chris.
> 
> --
> GPG Key ID: 088335A9
> GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883
> 35A9
> 
> 


Re: Hash collision vectors in APR?

2012-01-10 Thread William A. Rowe Jr.
On 1/10/2012 5:29 PM, Chris Darroch wrote:
> Bojan Smojver wrote:
> 
>> On Fri, 2012-01-06 at 10:52 +1100, Bojan Smojver wrote:
>>> +if (apr_generate_random_bytes(
>>> +(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS)
>>> +return NULL;
>>
>> Chris,
>>
>> What I mean is, instead of return NULL, we could have ht->seed =
>> (unsigned int) ht. Not as random as /dev/urandom, but probably better
>> than zero.

Or, rather, ht->seed = (unsigned int)ht - 1 to avoid the obviousness
of folding on even values.



Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver
On Wed, 2012-01-11 at 11:01 +1100, Bojan Smojver wrote:
> We can always use ANSI C rand()/srand() (racy)

This is what I meant. Some code stolen from crypto/getuuid.c.

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1227896)
+++ tables/apr_hash.c	(working copy)
@@ -18,6 +18,10 @@
 
 #include "apr_general.h"
 #include "apr_pools.h"
+#include "apr_time.h"
+#if APR_HAVE_STDLIB_H
+#include  /* for rand, srand */
+#endif
 
 #include "apr_hash.h"
 
@@ -75,7 +79,7 @@
 apr_pool_t  *pool;
 apr_hash_entry_t   **array;
 apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
 apr_hashfunc_t   hash_func;
 apr_hash_entry_t*free;  /* List of recycled entries */
 };
@@ -95,13 +99,26 @@
 APR_DECLARE(apr_hash_t *) apr_hash_make(apr_pool_t *pool)
 {
 apr_hash_t *ht;
+apr_time_t now;
+
 ht = apr_palloc(pool, sizeof(apr_hash_t));
 ht->pool = pool;
 ht->free = NULL;
 ht->count = 0;
 ht->max = INITIAL_MAX;
+#if APR_HAS_RANDOM
+if (apr_generate_random_bytes(
+(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS) {
+#endif
+now = apr_time_now();
+srand((unsigned int)(((now >> 32) ^ now) & 0x));
+ht->seed = (unsigned int)(rand());
+#if APR_HAS_RANDOM
+}
+#endif
 ht->array = alloc_array(ht, ht->max);
-ht->hash_func = apr_hashfunc_default;
+ht->hash_func = NULL;
+
 return ht;
 }
 
@@ -201,10 +218,10 @@
 ht->max = new_max;
 }
 
-APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
-  apr_ssize_t *klen)
+static unsigned int apr_hashfunc_default_internal(const char *char_key,
+  apr_ssize_t *klen,
+  unsigned int hash)
 {
-unsigned int hash = 0;
 const unsigned char *key = (const unsigned char *)char_key;
 const unsigned char *p;
 apr_ssize_t i;
@@ -246,7 +263,7 @@
  *
  *  -- Ralf S. Engelschall 
  */
- 
+
 if (*klen == APR_HASH_KEY_STRING) {
 for (p = key; *p; p++) {
 hash = hash * 33 + *p;
@@ -262,6 +279,11 @@
 return hash;
 }
 
+APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
+  apr_ssize_t *klen)
+{
+return apr_hashfunc_default_internal(char_key, klen, 0);
+}
 
 /*
  * This is where we keep the details of the hash function and control
@@ -280,7 +302,10 @@
 apr_hash_entry_t **hep, *he;
 unsigned int hash;
 
-hash = ht->hash_func(key, &klen);
+if (ht->hash_func)
+hash = ht->hash_func(key, &klen);
+else
+hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
 
 /* scan linked list */
 for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +347,7 @@
 ht->free = NULL;
 ht->count = orig->count;
 ht->max = orig->max;
+ht->seed = orig->seed;
 ht->hash_func = orig->hash_func;
 ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));
 


Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver
On Tue, 2012-01-10 at 15:29 -0800, Chris Darroch wrote:
> I don't know if any of that is useful for APR, but there it is. 

There is a function a bit like this in APR: get_random_info() in
crypto/getuuid.c. Essentially, if there is no
apr_generate_random_bytes(), then do the "magic".

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver
On Tue, 2012-01-10 at 16:04 -0800, Chris Darroch wrote:
> I guess the question is what happens if /dev/random
> is chosen, though, either automagically or through an explicit choice
> with --with-devrandom=/dev/random. 

I can tell you from experience - your program stalls a lot. And I mean,
A LOT. :-)

Yeah, I'm not hung on apr_generate_random_bytes() at all. Was just
spitballing in that patch...

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-10 Thread Chris Darroch

Bojan:


On Tue, 2012-01-10 at 15:29 -0800, Chris Darroch wrote:

Without having tested explicitly, it looks like the default case for
modern Linux is APR_HAS_RANDOM=1 and DEV_RANDOM=/dev/random,
with /dev/random blocking when there's no entropy.


Don't think so (run on my F-16 machine, without passing any options to
that effect):
---
checking for entropy source... /dev/urandom
---

If you look at the test, it has:
---
for f in /dev/arandom /dev/urandom /dev/random; do
---

So, non-blocking is preferred on Linux for sure.


  That's good -- I guess the question is what happens if /dev/random
is chosen, though, either automagically or through an explicit choice
with --with-devrandom=/dev/random.

  In the latter case, at least, I suppose it might be acceptable for
apr_hash_make() blocks, since you picked /dev/random and presumably
know what you're doing.

Chris.

--
GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9



Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver
On Tue, 2012-01-10 at 15:29 -0800, Chris Darroch wrote:

>The patch might also need to account for the !APR_HAS_RANDOM case,
> now that I look again.

Yes, true.

>So I'm less than certain what to suggest here ...

We can always use ANSI C rand()/srand() (racy) or even other APR random
functions from
http://apr.apache.org/docs/apr/1.4/group__apr__random.html.

PS. The address of ht will be somewhat random on Linux platforms with
exec shield.

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-10 Thread Bojan Smojver
On Tue, 2012-01-10 at 15:29 -0800, Chris Darroch wrote:
> Without having tested explicitly, it looks like the default case for
> modern Linux is APR_HAS_RANDOM=1 and DEV_RANDOM=/dev/random,
> with /dev/random blocking when there's no entropy.

Don't think so (run on my F-16 machine, without passing any options to
that effect):
---
checking for entropy source... /dev/urandom
---

If you look at the test, it has:
---
for f in /dev/arandom /dev/urandom /dev/random; do
---

So, non-blocking is preferred on Linux for sure.

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-10 Thread Chris Darroch

Bojan Smojver wrote:


On Fri, 2012-01-06 at 10:52 +1100, Bojan Smojver wrote:

+if (apr_generate_random_bytes(
+(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS)
+return NULL;


Chris,

What I mean is, instead of return NULL, we could have ht->seed =
(unsigned int) ht. Not as random as /dev/urandom, but probably better
than zero.


  Yes, probably ... I hesitate because I'm not a security expert and
I suppose there could be ways someone might know (a) you don't have
a working apr_generate_random_bytes() and (b) the address of ht struct.

  The patch might also need to account for the !APR_HAS_RANDOM case,
now that I look again.

  The other question, I suppose, is whether there's a hit to performance
from apr_generate_random_bytes() call.  Without having tested explicitly,
it looks like the default case for modern Linux is APR_HAS_RANDOM=1 and
DEV_RANDOM=/dev/random, with /dev/random blocking when there's no entropy.
That might, I think, cause apr_hash_make() to block unexpectedly for
callers.

  The main thing is just that apr_hash_make() really can't return NULL,
and probably shouldn't start blocking unexpectedly, either.  Alas,
too many things assume it succeeds, and succeeds quickly.


  A quick study of what the Perl folks did since 2003, after being
called out on the hash-collision-prediction issue, suggests they
use a quickly generated seed value -- see Perl_seed() in util.c,
which feeds Perl_get_hash_seed(), which is used to set the PL_rehash_seed
value in perl.c and then used in each hash calculation in
PERL_HASH_INTERNAL() in hv.h.  That's sort of all beside the point, but
it's interesting to see the stripped-down seed function, which seems
to boil down to (on most platforms):

#define   SEED_C1   103
#define   SEED_C2   3
#define   SEED_C3   269
#define   SEED_C4   73819
#define   SEED_C5   26107

Time_t when;
U32 u;

PerlProc_gettimeofday(&when,NULL);
u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec;
u += SEED_C3 * (U32)PerlProc_getpid();
u += SEED_C4 * (U32)PTR2UV(PL_stack_sp);
u += SEED_C5 * (U32)PTR2UV(&when);
return u;

  So there's a timestamp, a process ID, and two pointers converted
to integers, "spread out" by the various constant multiplier factors.
The code comments suggest this is not especially finely-tuned logic.
See http://perl5.git.perl.org/ for the source.  I don't know if any
of that is useful for APR, but there it is.


  Reading through Ben Laurie's paper on the apr_random_* functions
(http://www.apache-ssl.org/randomness.pdf) and the code I'd like to
say I fully grokked how to use it and whether it's sufficient, but
that would be untrue.  :-)  It seems like the only the process ID is
used, perhaps, unless one mixes in one's own entropy?

  So I'm less than certain what to suggest here ... maybe someone
with greater confidence in the subject can advise on whether a
"fast, simple" seed like the Perl one would be considered reasonable
for this particular problem, namely, making it harder to guess hash
collisions in a context where a large set of externally-provided
values are being used as keys.

Chris.

--
GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9



Re: Hash collision vectors in APR?

2012-01-09 Thread Bojan Smojver
On Fri, 2012-01-06 at 10:52 +1100, Bojan Smojver wrote:
> +if (apr_generate_random_bytes(
> +(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS)
> +return NULL;

Chris,

What I mean is, instead of return NULL, we could have ht->seed =
(unsigned int) ht. Not as random as /dev/urandom, but probably better
than zero.

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-09 Thread Bojan Smojver

--- Original message ---

From: Chris Darroch



If there's any
way to ensure the seed is always populated, that would be good.


I guess we could just take a newly created structure of the hash pointer, 
cast it into unsigned int and use that, in case fetching of random bytes 
fails.


--
Bojan 


Re: Hash collision vectors in APR?

2012-01-09 Thread Chris Darroch

Bojan Smojver wrote:


> On Fri, 2012-01-06 at 10:05 +1100, Bojan Smojver wrote:

>> Any other ideas?
> 
> Maybe like this?


  Something like that would be appreciated, by me, at least.

  One caution, though -- if apr_hash_make() can now return NULL,
a lot of things might suddenly break, like httpd.  If there's any
way to ensure the seed is always populated, that would be good.

  There's quite a bit of httpd code that assumes apr_hash_make()
never fails, aside from catastrophic conditions such as having no
available memory.

Thanks,

Chris.

--
GPG Key ID: 088335A9
GPG Key Fingerprint: 86CD 3297 7493 75BC F820  6715 F54F E648 0883 35A9



Re: Hash collision vectors in APR?

2012-01-05 Thread Bojan Smojver
On Fri, 2012-01-06 at 10:05 +1100, Bojan Smojver wrote:
> Any other ideas?

Maybe like this?

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1227896)
+++ tables/apr_hash.c	(working copy)
@@ -75,7 +75,7 @@
 apr_pool_t  *pool;
 apr_hash_entry_t   **array;
 apr_hash_index_t iterator;  /* For apr_hash_first(NULL, ...) */
-unsigned int count, max;
+unsigned int count, max, seed;
 apr_hashfunc_t   hash_func;
 apr_hash_entry_t*free;  /* List of recycled entries */
 };
@@ -100,8 +100,12 @@
 ht->free = NULL;
 ht->count = 0;
 ht->max = INITIAL_MAX;
+if (apr_generate_random_bytes(
+(unsigned char *)&ht->seed, sizeof(ht->seed)) != APR_SUCCESS)
+return NULL;
 ht->array = alloc_array(ht, ht->max);
-ht->hash_func = apr_hashfunc_default;
+ht->hash_func = NULL;
+
 return ht;
 }
 
@@ -201,10 +205,10 @@
 ht->max = new_max;
 }
 
-APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
-  apr_ssize_t *klen)
+static unsigned int apr_hashfunc_default_internal(const char *char_key,
+  apr_ssize_t *klen,
+  unsigned int hash)
 {
-unsigned int hash = 0;
 const unsigned char *key = (const unsigned char *)char_key;
 const unsigned char *p;
 apr_ssize_t i;
@@ -246,7 +250,7 @@
  *
  *  -- Ralf S. Engelschall 
  */
- 
+
 if (*klen == APR_HASH_KEY_STRING) {
 for (p = key; *p; p++) {
 hash = hash * 33 + *p;
@@ -262,6 +266,11 @@
 return hash;
 }
 
+APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
+  apr_ssize_t *klen)
+{
+return apr_hashfunc_default_internal(char_key, klen, 0);
+}
 
 /*
  * This is where we keep the details of the hash function and control
@@ -280,7 +289,10 @@
 apr_hash_entry_t **hep, *he;
 unsigned int hash;
 
-hash = ht->hash_func(key, &klen);
+if (ht->hash_func)
+hash = ht->hash_func(key, &klen);
+else
+hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
 
 /* scan linked list */
 for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -322,6 +334,7 @@
 ht->free = NULL;
 ht->count = orig->count;
 ht->max = orig->max;
+ht->seed = orig->seed;
 ht->hash_func = orig->hash_func;
 ht->array = (apr_hash_entry_t **)((char *)ht + sizeof(apr_hash_t));
 


Re: Hash collision vectors in APR?

2012-01-05 Thread Bojan Smojver
On Fri, 2012-01-06 at 09:48 +1100, Bojan Smojver wrote:
> The apr_hashfunc_t function prototype would then most likely have to
> change. We'd probably need to pass the hash itself into it, which
> would then hold the per-hash seed. Right?

Actually, that would not be a good plan. A custom hash function would
not be able to see inside the hash structure, because it's opaque. Most
likely, the seed itself would have to be passed around as the third
argument. Any other ideas?

PS. Of course, this kind of thing cannot be done before 2.0.

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-05 Thread Bojan Smojver
On Thu, 2012-01-05 at 16:39 -0600, William A. Rowe Jr. wrote:
> Question; do we want each hash to have a unique randomization factor?

That would probably be more secure. As is, seed would be initialised
just once per process.

The apr_hashfunc_t function prototype would then most likely have to
change. We'd probably need to pass the hash itself into it, which would
then hold the per-hash seed. Right?

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-05 Thread William A. Rowe Jr.
On 1/5/2012 4:13 PM, Bojan Smojver wrote:
> On Thu, 2012-01-05 at 11:45 -0600, William A. Rowe Jr. wrote:
>> Should we add some randomization to prevent abuse?
> 
> No idea whether this is something that may be useful, but here it is
> nevertheless. At least it can be used as an example of what not to
> do. :-)

Question; do we want each hash to have a unique randomization factor?



Re: Hash collision vectors in APR?

2012-01-05 Thread Bojan Smojver
On Thu, 2012-01-05 at 11:45 -0600, William A. Rowe Jr. wrote:
> Should we add some randomization to prevent abuse?

No idea whether this is something that may be useful, but here it is
nevertheless. At least it can be used as an example of what not to
do. :-)

-- 
Bojan
Index: tables/apr_hash.c
===
--- tables/apr_hash.c	(revision 1227853)
+++ tables/apr_hash.c	(working copy)
@@ -21,6 +21,8 @@
 
 #include "apr_hash.h"
 
+#include "apr_atomic.h"
+
 #if APR_HAVE_STDLIB_H
 #include 
 #endif
@@ -32,6 +34,10 @@
 #include 
 #endif
 
+/* Randomise hash */
+static apr_uint32_t initialised = 0, in_init = 1;
+static unsigned int seed;
+
 /*
  * The internal form of a hash table.
  *
@@ -246,6 +252,17 @@
  *
  *  -- Ralf S. Engelschall 
  */
+
+if (!apr_atomic_inc32(&initialised)) {
+apr_generate_random_bytes(&seed, sizeof(seed));
+apr_atomic_dec32(&in_init);
+}
+apr_atomic_set32(&initialised, 1); /* prevent wrap-around */
+
+while (apr_atomic_read32(&in_init)) /* wait until we get fully inited */
+;
+
+hash = seed;
  
 if (*klen == APR_HASH_KEY_STRING) {
 for (p = key; *p; p++) {


Re: Hash collision vectors in APR?

2012-01-05 Thread Bojan Smojver
On Thu, 2012-01-05 at 11:45 -0600, William A. Rowe Jr. wrote:
> Should we add some randomization to prevent abuse?

There are Ruby patches in RH bug database that may help as an example:

https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2011-4815

-- 
Bojan



Re: Hash collision vectors in APR?

2012-01-05 Thread William A. Rowe Jr.
On 1/5/2012 12:37 PM, Ben Laurie wrote:
> On Thu, Jan 5, 2012 at 5:45 PM, William A. Rowe Jr.  
> wrote:
>> http://www.nruns.com/_downloads/advisory28122011.pdf
>>
>> Should we add some randomization to prevent abuse?
> 
> Yes.

So my question comes down to, if we want to preserve using primes
in the multiplier, has anyone come up with an efficient algorithm
to generate one (or two)?



Re: Hash collision vectors in APR?

2012-01-05 Thread Jeff Trawick
On Thu, Jan 5, 2012 at 12:45 PM, William A. Rowe Jr.
 wrote:
> http://www.nruns.com/_downloads/advisory28122011.pdf
>
> Should we add some randomization to prevent abuse?
>
> It's hard to anticipate how folks might leverage apr, and how malicious
> folks might then seek to exploit computational workload vectors.
>
> Thoughts?

We don't want to say "go fish" to APR applications if our hash plus
their vector results in an exploit.