Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-11 Thread Jan Kaluža

On 07/10/2013 06:24 PM, Joe Orton wrote:

On Tue, Jul 09, 2013 at 11:02:18PM +0200, Stefan Fritsch wrote:

On Tuesday 09 July 2013, Joe Orton wrote:

On Tue, Jul 09, 2013 at 10:00:19AM +0200, Jan Kaluza wrote:

I agree 20 bytes could be too much. I have changed my patch to
have only 10 bytes long root. I will check the Daniel's ideas
mentioned in another mail in this thread and try to implement
it, but if we are going to do it my way, I think this patch
should be OK.


+1 here, this looks good to me.


+1


Thanks for the review Stefan & thanks for the patch Jan!

=> http://svn.apache.org/viewvc?view=revision&revision=1501827

This patch actually removes the noticeable delay when running the test
suite, while A::T waits for httpd to start serving requests!  I had no
idea that was caused by mod_unique_id.  Very cool :)

Regards, Joe



Thank you all for reviewing and accepting the patch.

Regards,
Jan Kaluza



Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-10 Thread Joe Orton
On Tue, Jul 09, 2013 at 11:02:18PM +0200, Stefan Fritsch wrote:
> On Tuesday 09 July 2013, Joe Orton wrote:
> > On Tue, Jul 09, 2013 at 10:00:19AM +0200, Jan Kaluza wrote:
> > > I agree 20 bytes could be too much. I have changed my patch to
> > > have only 10 bytes long root. I will check the Daniel's ideas
> > > mentioned in another mail in this thread and try to implement
> > > it, but if we are going to do it my way, I think this patch
> > > should be OK.
> > 
> > +1 here, this looks good to me.
> 
> +1

Thanks for the review Stefan & thanks for the patch Jan!

=> http://svn.apache.org/viewvc?view=revision&revision=1501827

This patch actually removes the noticeable delay when running the test 
suite, while A::T waits for httpd to start serving requests!  I had no 
idea that was caused by mod_unique_id.  Very cool :)

Regards, Joe


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-09 Thread Stefan Fritsch
On Tuesday 09 July 2013, Joe Orton wrote:
> On Tue, Jul 09, 2013 at 10:00:19AM +0200, Jan Kaluza wrote:
> > I agree 20 bytes could be too much. I have changed my patch to
> > have only 10 bytes long root. I will check the Daniel's ideas
> > mentioned in another mail in this thread and try to implement
> > it, but if we are going to do it my way, I think this patch
> > should be OK.
> 
> +1 here, this looks good to me.

+1

> Any objections to this approach?



Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-09 Thread Stefan Fritsch
On Sunday 07 July 2013, Daniel Lescohier wrote:
> Another option:
> 
> typedef struct {
> apr_uint32_t stamp;
> apr_uint32_t counter;
> apr_uint16_t stamp_fraction;
> char root[ROOT_SIZE];
> } unique_id_rec;
> 
> where ROOT_SIZE=8, and stamp_fraction is set on every request to
> htons(r->request_time & 0x).

If we have a 32bit counter that is incremented atomically, there is 
really no need for the stamp_fraction. 32bit won't overflow in a 
second. But I think the patch proposed by Jan is good enough and does 
not need to use atomic operations.


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-09 Thread Joe Orton
On Tue, Jul 09, 2013 at 10:00:19AM +0200, Jan Kaluza wrote:
> I agree 20 bytes could be too much. I have changed my patch to have
> only 10 bytes long root. I will check the Daniel's ideas mentioned
> in another mail in this thread and try to implement it, but if we
> are going to do it my way, I think this patch should be OK.

+1 here, this looks good to me.  Any objections to this approach?

Regards, Joe


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-09 Thread Jan Kaluža

On 07/06/2013 01:58 AM, Stefan Fritsch wrote:

On Wednesday 26 June 2013, Jan Kaluža wrote:

currently mod_unique_id uses apr_gethostname(...) and PID pair as a
base to generate unique ID. The way how it's implemented brings
some problems:

1. For IPv6-only hosts it uses low-order bits of IPv6 address as if
they were unique, which is wrong.

2. It relies on working DNS. It can happen that hostname does not
have the IP assigned during httpd start (for example during the
boot) and I think it is still valid use-case (without
mod_unique_id module loaded, httpd works well in this case).

3. It calls 1 second sleep to overcome possible usage of the same
PID after restart as the one used before the restart.

If I'm right, we could fix the problems above by using
ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base
for unique ID generation. It would also make the code simpler. I
think the randomness generated by ap_random_insecure_bytes() is at
least the same as the one introduced by apr_gethostname() + pid
pair.

The attached patch implements it by removing in_addr/pid fields
unique_id_rec struct and introduces new "root" field which is
initialized using ap_random_insecure_bytes() function and is used
as a base for unique IDs.


I agree in principle, but I would prefer the ID length to not increase
that much. You replace 8 bytes ip+pid with 20 bytes "root", which
means 16 bytes increase in base64 format. The unique id is also used
as a request log id and having an additional 16 bytes of prefix in the
error log (if one uses that feature) does not really appeal to me.


Thanks for reviewing the patch.

I agree 20 bytes could be too much. I have changed my patch to have only 
10 bytes long root. I will check the Daniel's ideas mentioned in another 
mail in this thread and try to implement it, but if we are going to do 
it my way, I think this patch should be OK.



But 20 bytes may be excessive, anyway: If one assumes 1 servers
over which the IDs needs to be unique, and uses a 8 byte "root" field,
the probability for a collision is 3*10^(-12) if I did the math
correctly. This still has to be multiplied with the number of restarts
of the servers, so to be save we could use a 10 byte "root" field,
giving a collision probability (per server restart) of 4*10^(-17),
which should be ok.



Regards,
Jan Kaluza

diff --git a/modules/metadata/mod_unique_id.c b/modules/metadata/mod_unique_id.c
index 8bd858f..8756055 100644
--- a/modules/metadata/mod_unique_id.c
+++ b/modules/metadata/mod_unique_id.c
@@ -31,14 +31,11 @@
 #include "http_log.h"
 #include "http_protocol.h"  /* for ap_hook_post_read_request */
 
-#if APR_HAVE_UNISTD_H
-#include  /* for getpid() */
-#endif
+#define ROOT_SIZE 10
 
 typedef struct {
 unsigned int stamp;
-unsigned int in_addr;
-unsigned int pid;
+char root[ROOT_SIZE];
 unsigned short counter;
 unsigned int thread_index;
 } unique_id_rec;
@@ -64,18 +61,13 @@ typedef struct {
  * gethostbyname (gethostname()) is unique across all the machines at the
  * "site".
  *
- * We also further assume that pids fit in 32-bits.  If something uses more
- * than 32-bits, the fix is trivial, but it requires the unrolled uuencoding
- * loop to be extended.  * A similar fix is needed to support multithreaded
- * servers, using a pid/tid combo.
- *
- * Together, the in_addr and pid are assumed to absolutely uniquely identify
+ * The root is assumed to absolutely uniquely identify
  * this one child from all other currently running children on all servers
  * (including this physical server if it is running multiple httpds) from each
  * other.
  *
  * The stamp and counter are used to distinguish all hits for a particular
- * (in_addr,pid) pair.  The stamp is updated using r->request_time,
+ * root.  The stamp is updated using r->request_time,
  * saving cpu cycles.  The counter is never reset, and is used to permit up to
  * 64k requests in a single second by a single child.
  *
@@ -92,7 +84,7 @@ typedef struct {
  * module change.
  *
  * It is highly desirable that identifiers exist for "eternity".  But future
- * needs (such as much faster webservers, moving to 64-bit pids, or moving to a
+ * needs (such as much faster webservers, or moving to a
  * multithreaded server) may dictate a need to change the contents of
  * unique_id_rec.  Such a future implementation should ensure that the first
  * field is still a time_t stamp.  By doing that, it is possible for a site to
@@ -116,8 +108,6 @@ typedef struct {
  * htonl/ntohl. Well, this shouldn't be a problem till year 2106.
  */
 
-static unsigned global_in_addr;
-
 /*
  * XXX: We should have a per-thread counter and not use cur_unique_id.counter
  * XXX: in all threads, because this is bad for performance on multi-processor
@@ -129,7 +119,7 @@ static unique_id_rec cur_unique_id;
 /*
  * Number of elements in the structure unique_id_rec.
  */
-#define UNIQUE_ID_REC_MAX 5
+#define UNIQUE_ID_REC_MAX 4
 
 static unsigned

Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-07 Thread Daniel Lescohier
Actually, for stamp_fraction, one should choose either:

r->request_time & 0x /* microseconds modulus 65536 */

apr_uint16_t ui16;
htons(ui16 = apr_time_usec(r->request_time) >> 4) /* fraction of a second
with denominator 62500 */



On Sun, Jul 7, 2013 at 7:32 AM, Daniel Lescohier
wrote:

> Another option:
>
>
> typedef struct {
> apr_uint32_t stamp;
> apr_uint32_t counter;
> apr_uint16_t stamp_fraction;
> char root[ROOT_SIZE];
> } unique_id_rec;
>
> where ROOT_SIZE=8, and stamp_fraction is set on every request to
> htons(r->request_time & 0x).
>
> The child will be initialized with 12 bytes of random data (root +
> counter).  If one is not as confident of having a good random source, then
> it's better to also include the fractional second: there will be few
> requests happening in the same 1/65,536th of a second of the wall clock; if
> there are multiple requests in that fraction of a second, if the multiple
> requests are in the same child process, then the atomic increment
> (apr_atomic_inc32) of the counter alone will guarantee different ids;
> otherwise, for those few requests in that fraction of a second in multiple
> processes (on the same or different servers), the randomness initialized by
> the child in root+counter will be used to make different ids.
>
>
>
> On Sat, Jul 6, 2013 at 9:38 AM, Daniel Lescohier <
> daniel.lescoh...@cbsi.com> wrote:
>
>> Ah, I missed that.  I see it's in the doxygen docs for the random
>> module.  However, the sources aren't under random/, they're under misc/.  I
>> was switching between the doxygen docs and looking at the sources; maybe
>> when I was looking for it, I missed it because the sources were under misc/.
>>
>> For the standard module: you can keep the unique id size the same, but
>> still have a root of 10 bytes, by getting rid of the thread_index:
>> thread_index is very wasteful, using 32 bits, when only a few bits are used.
>>
>> Change:
>>
>> typedef struct {
>> unsigned int stamp;
>> unsigned int in_addr;
>> unsigned int pid;
>> unsigned short counter;
>> unsigned int thread_index;
>> } unique_id_rec;
>>
>> to:
>>
>> typedef struct {
>> apr_uint32_t stamp;
>> apr_uint32_t counter;
>> char root[ROOT_SIZE];
>> } unique_id_rec;
>>
>> Have the two ints first for alignment purposes, so there is no padding in
>> the struct.
>>
>> With the counter field using apr_uint32_t, you can use apr_atomic_inc32()
>> to do the increments.  There is less need of thread_index if you increment
>> the counter atomically.
>>
>> The initialization of the counter with random data also gives you more
>> process randomness than the 10 bytes of root.
>>
>>
>>
>> On Fri, Jul 5, 2013 at 8:04 PM, Stefan Fritsch  wrote:
>>
>>> On Wednesday 26 June 2013, Daniel Lescohier wrote:
>>> > When I looked into the ap random functions, I didn't like the
>>> > implementation, because I didn't see anywhere in the httpd codebase
>>> > that entropy is periodically added to the entropy pool.  After
>>> > reading the details of how the Linux entropy pool works
>>> > (https://lwn
>>> > .net/Articles/525204/), I decided to use /dev/urandom instead,
>>> > since Linux is periodically adding entropy to it.  This code is
>>> > not portable, but this was for a private Apache module that is
>>> > only used on Linux.
>>> >
>>> > To preserve entropy on the web server machine, I also only generate
>>> > a random number once per apache child, then increment an uint32
>>> > portion of it for each unique id call.  I also have seconds and
>>> > microseconds, so that's why I think it's OK to do increments from
>>> > the random base, instead of generating a new random id on each
>>> > request.
>>>
>>> The "insecure" in ap_random_insecure_bytes is there for a reason. But
>>> if you only use it once per process, anyway, it should be sufficient.
>>> The fact that several consumers (especially with multi-threaded mpms)
>>> pull from the same pool in undefined order adds some entropy, too.
>>>
>>> FWIW, there is apr_generate_random_bytes() which can do the reading of
>>> /dev/urandom for you.
>>>
>>
>>
>


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-07 Thread Daniel Lescohier
Another option:

typedef struct {
apr_uint32_t stamp;
apr_uint32_t counter;
apr_uint16_t stamp_fraction;
char root[ROOT_SIZE];
} unique_id_rec;

where ROOT_SIZE=8, and stamp_fraction is set on every request to
htons(r->request_time & 0x).

The child will be initialized with 12 bytes of random data (root +
counter).  If one is not as confident of having a good random source, then
it's better to also include the fractional second: there will be few
requests happening in the same 1/65,536th of a second of the wall clock; if
there are multiple requests in that fraction of a second, if the multiple
requests are in the same child process, then the atomic increment
(apr_atomic_inc32) of the counter alone will guarantee different ids;
otherwise, for those few requests in that fraction of a second in multiple
processes (on the same or different servers), the randomness initialized by
the child in root+counter will be used to make different ids.



On Sat, Jul 6, 2013 at 9:38 AM, Daniel Lescohier
wrote:

> Ah, I missed that.  I see it's in the doxygen docs for the random module.
> However, the sources aren't under random/, they're under misc/.  I was
> switching between the doxygen docs and looking at the sources; maybe when I
> was looking for it, I missed it because the sources were under misc/.
>
> For the standard module: you can keep the unique id size the same, but
> still have a root of 10 bytes, by getting rid of the thread_index:
> thread_index is very wasteful, using 32 bits, when only a few bits are used.
>
> Change:
>
> typedef struct {
> unsigned int stamp;
> unsigned int in_addr;
> unsigned int pid;
> unsigned short counter;
> unsigned int thread_index;
> } unique_id_rec;
>
> to:
>
> typedef struct {
> apr_uint32_t stamp;
> apr_uint32_t counter;
> char root[ROOT_SIZE];
> } unique_id_rec;
>
> Have the two ints first for alignment purposes, so there is no padding in
> the struct.
>
> With the counter field using apr_uint32_t, you can use apr_atomic_inc32()
> to do the increments.  There is less need of thread_index if you increment
> the counter atomically.
>
> The initialization of the counter with random data also gives you more
> process randomness than the 10 bytes of root.
>
>
>
> On Fri, Jul 5, 2013 at 8:04 PM, Stefan Fritsch  wrote:
>
>> On Wednesday 26 June 2013, Daniel Lescohier wrote:
>> > When I looked into the ap random functions, I didn't like the
>> > implementation, because I didn't see anywhere in the httpd codebase
>> > that entropy is periodically added to the entropy pool.  After
>> > reading the details of how the Linux entropy pool works
>> > (https://lwn
>> > .net/Articles/525204/), I decided to use /dev/urandom instead,
>> > since Linux is periodically adding entropy to it.  This code is
>> > not portable, but this was for a private Apache module that is
>> > only used on Linux.
>> >
>> > To preserve entropy on the web server machine, I also only generate
>> > a random number once per apache child, then increment an uint32
>> > portion of it for each unique id call.  I also have seconds and
>> > microseconds, so that's why I think it's OK to do increments from
>> > the random base, instead of generating a new random id on each
>> > request.
>>
>> The "insecure" in ap_random_insecure_bytes is there for a reason. But
>> if you only use it once per process, anyway, it should be sufficient.
>> The fact that several consumers (especially with multi-threaded mpms)
>> pull from the same pool in undefined order adds some entropy, too.
>>
>> FWIW, there is apr_generate_random_bytes() which can do the reading of
>> /dev/urandom for you.
>>
>
>


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-06 Thread Daniel Lescohier
Ah, I missed that.  I see it's in the doxygen docs for the random module.
However, the sources aren't under random/, they're under misc/.  I was
switching between the doxygen docs and looking at the sources; maybe when I
was looking for it, I missed it because the sources were under misc/.

For the standard module: you can keep the unique id size the same, but
still have a root of 10 bytes, by getting rid of the thread_index:
thread_index is very wasteful, using 32 bits, when only a few bits are used.

Change:

typedef struct {
unsigned int stamp;
unsigned int in_addr;
unsigned int pid;
unsigned short counter;
unsigned int thread_index;
} unique_id_rec;

to:

typedef struct {
apr_uint32_t stamp;
apr_uint32_t counter;
char root[ROOT_SIZE];
} unique_id_rec;

Have the two ints first for alignment purposes, so there is no padding in
the struct.

With the counter field using apr_uint32_t, you can use apr_atomic_inc32()
to do the increments.  There is less need of thread_index if you increment
the counter atomically.

The initialization of the counter with random data also gives you more
process randomness than the 10 bytes of root.



On Fri, Jul 5, 2013 at 8:04 PM, Stefan Fritsch  wrote:

> On Wednesday 26 June 2013, Daniel Lescohier wrote:
> > When I looked into the ap random functions, I didn't like the
> > implementation, because I didn't see anywhere in the httpd codebase
> > that entropy is periodically added to the entropy pool.  After
> > reading the details of how the Linux entropy pool works
> > (https://lwn
> > .net/Articles/525204/), I decided to use /dev/urandom instead,
> > since Linux is periodically adding entropy to it.  This code is
> > not portable, but this was for a private Apache module that is
> > only used on Linux.
> >
> > To preserve entropy on the web server machine, I also only generate
> > a random number once per apache child, then increment an uint32
> > portion of it for each unique id call.  I also have seconds and
> > microseconds, so that's why I think it's OK to do increments from
> > the random base, instead of generating a new random id on each
> > request.
>
> The "insecure" in ap_random_insecure_bytes is there for a reason. But
> if you only use it once per process, anyway, it should be sufficient.
> The fact that several consumers (especially with multi-threaded mpms)
> pull from the same pool in undefined order adds some entropy, too.
>
> FWIW, there is apr_generate_random_bytes() which can do the reading of
> /dev/urandom for you.
>


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-05 Thread Stefan Fritsch
On Wednesday 26 June 2013, Daniel Lescohier wrote:
> When I looked into the ap random functions, I didn't like the
> implementation, because I didn't see anywhere in the httpd codebase
> that entropy is periodically added to the entropy pool.  After
> reading the details of how the Linux entropy pool works
> (https://lwn
> .net/Articles/525204/), I decided to use /dev/urandom instead,
> since Linux is periodically adding entropy to it.  This code is
> not portable, but this was for a private Apache module that is
> only used on Linux.
> 
> To preserve entropy on the web server machine, I also only generate
> a random number once per apache child, then increment an uint32
> portion of it for each unique id call.  I also have seconds and
> microseconds, so that's why I think it's OK to do increments from
> the random base, instead of generating a new random id on each
> request.

The "insecure" in ap_random_insecure_bytes is there for a reason. But 
if you only use it once per process, anyway, it should be sufficient. 
The fact that several consumers (especially with multi-threaded mpms) 
pull from the same pool in undefined order adds some entropy, too.

FWIW, there is apr_generate_random_bytes() which can do the reading of 
/dev/urandom for you.


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-07-05 Thread Stefan Fritsch
On Wednesday 26 June 2013, Jan Kaluža wrote:
> currently mod_unique_id uses apr_gethostname(...) and PID pair as a
> base to generate unique ID. The way how it's implemented brings
> some problems:
> 
> 1. For IPv6-only hosts it uses low-order bits of IPv6 address as if
> they were unique, which is wrong.
> 
> 2. It relies on working DNS. It can happen that hostname does not
> have the IP assigned during httpd start (for example during the
> boot) and I think it is still valid use-case (without
> mod_unique_id module loaded, httpd works well in this case).
> 
> 3. It calls 1 second sleep to overcome possible usage of the same
> PID after restart as the one used before the restart.
> 
> If I'm right, we could fix the problems above by using
> ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base
> for unique ID generation. It would also make the code simpler. I
> think the randomness generated by ap_random_insecure_bytes() is at
> least the same as the one introduced by apr_gethostname() + pid
> pair.
> 
> The attached patch implements it by removing in_addr/pid fields
> unique_id_rec struct and introduces new "root" field which is
> initialized using ap_random_insecure_bytes() function and is used
> as a base for unique IDs.

I agree in principle, but I would prefer the ID length to not increase 
that much. You replace 8 bytes ip+pid with 20 bytes "root", which 
means 16 bytes increase in base64 format. The unique id is also used 
as a request log id and having an additional 16 bytes of prefix in the 
error log (if one uses that feature) does not really appeal to me.

But 20 bytes may be excessive, anyway: If one assumes 1 servers 
over which the IDs needs to be unique, and uses a 8 byte "root" field, 
the probability for a collision is 3*10^(-12) if I did the math 
correctly. This still has to be multiplied with the number of restarts 
of the servers, so to be save we could use a 10 byte "root" field, 
giving a collision probability (per server restart) of 4*10^(-17), 
which should be ok.


Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-06-26 Thread Daniel Lescohier
We have an internal custom Apache module that generates UUIDs; it's using a
legacy format that we need for our application.  A few months ago, I was
upgrading it to Apache 2.4, and I worked on modifying the UUID generator
function, so that some of the bytes are random (including replacing the 4
byte IPv4 address element of the uuid that we used before with random data
instead).

When I looked into the ap random functions, I didn't like the
implementation, because I didn't see anywhere in the httpd codebase that
entropy is periodically added to the entropy pool.  After reading the
details of how the Linux entropy pool works (https://lwn
.net/Articles/525204/), I decided to use /dev/urandom instead, since Linux
is periodically adding entropy to it.  This code is not portable, but this
was for a private Apache module that is only used on Linux.

To preserve entropy on the web server machine, I also only generate a
random number once per apache child, then increment an uint32 portion of it
for each unique id call.  I also have seconds and microseconds, so that's
why I think it's OK to do increments from the random base, instead of
generating a new random id on each request.

Our app requires network order time_t in the second 32-bit word, so that's
why the function is like below.  If our app didn't require time_t, then I
would not have used the high order bits of time_t, because it's fairly
constant: to have a more unique id, it would be better to have more random
data instead of that fairly constant data.

/**
* Make a new unique id as follows:
* 1. Define the id.
* 2. Base64-encode the id.
* @param r Apache's request record.
* @return The base64-encode cookie string, allocated in the request pool.
*/
static char *make_id( request_rec *r )
{
#define RAW_COOKIE_SIZE (14)
#define WANT_BASE64_COOKIE_SIZE (19)
#define BASE64_ENCODE_LEN(len) (((len + 2) / 3 * 4) + 1)
#define MASK_BITS(n) ((1request_time);
raw_cookie.i1 = htonl(i);
raw_cookie.s1 = rnd >> 11;
raw_cookie.i2 = htonl(apr_time_sec(r->request_time));
raw_cookie.i3 = rand_uints.i2;

apr_base64_encode_binary(b64_cookie, (const void *)&raw_cookie,
RAW_COOKIE_SIZE);
if (BASE64_ENCODE_LEN(RAW_COOKIE_SIZE) > WANT_BASE64_COOKIE_SIZE)
b64_cookie[WANT_BASE64_COOKIE_SIZE] = '\0';
return b64_cookie;
}

And supporting code:

typedef struct {
apr_uint32_t i1;
apr_uint32_t i2;
} rand_uints_t;
static rand_uints_t rand_uints;
static apr_thread_mutex_t *rand_mutex;
static int rand_initialized;

static void init_random(request_rec *r)
{
ap_assert(APR_SUCCESS == apr_thread_mutex_lock(rand_mutex));
if (rand_initialized == 0) { /* we won the race to initialize */
apr_file_t *f;
apr_size_t bytes_read;
ap_assert(APR_SUCCESS == apr_file_open(&f, "/dev/urandom",
APR_READ|APR_BINARY, 0444, r->pool));
ap_assert(APR_SUCCESS == apr_file_read_full(f, &rand_uints,
sizeof(rand_uints_t), &bytes_read));
ap_assert(sizeof(rand_uints) == bytes_read);
ap_assert(APR_SUCCESS == apr_file_close(f));
rand_initialized = 1;
}
ap_assert(APR_SUCCESS == apr_thread_mutex_unlock(rand_mutex));
}


/**
* Initialization for each child process.
*/
static void init_child(apr_pool_t *p, server_rec *s) {
ap_assert(APR_SUCCESS == apr_thread_mutex_create(&rand_mutex,
APR_THREAD_MUTEX_DEFAULT, p));
}



On Wed, Jun 26, 2013 at 8:49 AM, Jan Kaluža  wrote:

> Hi,
>
> currently mod_unique_id uses apr_gethostname(...) and PID pair as a base
> to generate unique ID. The way how it's implemented brings some problems:
>
> 1. For IPv6-only hosts it uses low-order bits of IPv6 address as if they
> were unique, which is wrong.
>
> 2. It relies on working DNS. It can happen that hostname does not have the
> IP assigned during httpd start (for example during the boot) and I think it
> is still valid use-case (without mod_unique_id module loaded, httpd works
> well in this case).
>
> 3. It calls 1 second sleep to overcome possible usage of the same PID
> after restart as the one used before the restart.
>
> If I'm right, we could fix the problems above by using
> ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base for
> unique ID generation. It would also make the code simpler. I think the
> randomness generated by ap_random_insecure_bytes() is at least the same as
> the one introduced by apr_gethostname() + pid pair.
>
> The attached patch implements it by removing in_addr/pid fields
> unique_id_rec struct and introduces new "root" field which is initialized
> using ap_random_insecu

[PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID

2013-06-26 Thread Jan Kaluža

Hi,

currently mod_unique_id uses apr_gethostname(...) and PID pair as a base 
to generate unique ID. The way how it's implemented brings some problems:


1. For IPv6-only hosts it uses low-order bits of IPv6 address as if they 
were unique, which is wrong.


2. It relies on working DNS. It can happen that hostname does not have 
the IP assigned during httpd start (for example during the boot) and I 
think it is still valid use-case (without mod_unique_id module loaded, 
httpd works well in this case).


3. It calls 1 second sleep to overcome possible usage of the same PID 
after restart as the one used before the restart.


If I'm right, we could fix the problems above by using 
ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base for 
unique ID generation. It would also make the code simpler. I think the 
randomness generated by ap_random_insecure_bytes() is at least the same 
as the one introduced by apr_gethostname() + pid pair.


The attached patch implements it by removing in_addr/pid fields 
unique_id_rec struct and introduces new "root" field which is 
initialized using ap_random_insecure_bytes() function and is used as a 
base for unique IDs.


Regards,
Jan Kaluza
diff --git a/modules/metadata/mod_unique_id.c b/modules/metadata/mod_unique_id.c
index 8bd858f..8756055 100644
--- a/modules/metadata/mod_unique_id.c
+++ b/modules/metadata/mod_unique_id.c
@@ -31,14 +31,11 @@
 #include "http_log.h"
 #include "http_protocol.h"  /* for ap_hook_post_read_request */
 
-#if APR_HAVE_UNISTD_H
-#include  /* for getpid() */
-#endif
+#define ROOT_SIZE 20
 
 typedef struct {
 unsigned int stamp;
-unsigned int in_addr;
-unsigned int pid;
+char root[ROOT_SIZE];
 unsigned short counter;
 unsigned int thread_index;
 } unique_id_rec;
@@ -64,18 +61,13 @@ typedef struct {
  * gethostbyname (gethostname()) is unique across all the machines at the
  * "site".
  *
- * We also further assume that pids fit in 32-bits.  If something uses more
- * than 32-bits, the fix is trivial, but it requires the unrolled uuencoding
- * loop to be extended.  * A similar fix is needed to support multithreaded
- * servers, using a pid/tid combo.
- *
- * Together, the in_addr and pid are assumed to absolutely uniquely identify
+ * The root is assumed to absolutely uniquely identify
  * this one child from all other currently running children on all servers
  * (including this physical server if it is running multiple httpds) from each
  * other.
  *
  * The stamp and counter are used to distinguish all hits for a particular
- * (in_addr,pid) pair.  The stamp is updated using r->request_time,
+ * root.  The stamp is updated using r->request_time,
  * saving cpu cycles.  The counter is never reset, and is used to permit up to
  * 64k requests in a single second by a single child.
  *
@@ -92,7 +84,7 @@ typedef struct {
  * module change.
  *
  * It is highly desirable that identifiers exist for "eternity".  But future
- * needs (such as much faster webservers, moving to 64-bit pids, or moving to a
+ * needs (such as much faster webservers, or moving to a
  * multithreaded server) may dictate a need to change the contents of
  * unique_id_rec.  Such a future implementation should ensure that the first
  * field is still a time_t stamp.  By doing that, it is possible for a site to
@@ -116,8 +108,6 @@ typedef struct {
  * htonl/ntohl. Well, this shouldn't be a problem till year 2106.
  */
 
-static unsigned global_in_addr;
-
 /*
  * XXX: We should have a per-thread counter and not use cur_unique_id.counter
  * XXX: in all threads, because this is bad for performance on multi-processor
@@ -129,7 +119,7 @@ static unique_id_rec cur_unique_id;
 /*
  * Number of elements in the structure unique_id_rec.
  */
-#define UNIQUE_ID_REC_MAX 5
+#define UNIQUE_ID_REC_MAX 4
 
 static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX],
   unique_id_rec_size[UNIQUE_ID_REC_MAX],
@@ -138,24 +128,17 @@ static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX],
 
 static int unique_id_global_init(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *main_server)
 {
-char str[APRMAXHOSTLEN + 1];
-apr_status_t rv;
-char *ipaddrstr;
-apr_sockaddr_t *sockaddr;
-
 /*
  * Calculate the sizes and offsets in cur_unique_id.
  */
 unique_id_rec_offset[0] = APR_OFFSETOF(unique_id_rec, stamp);
 unique_id_rec_size[0] = sizeof(cur_unique_id.stamp);
-unique_id_rec_offset[1] = APR_OFFSETOF(unique_id_rec, in_addr);
-unique_id_rec_size[1] = sizeof(cur_unique_id.in_addr);
-unique_id_rec_offset[2] = APR_OFFSETOF(unique_id_rec, pid);
-unique_id_rec_size[2] = sizeof(cur_unique_id.pid);
-unique_id_rec_offset[3] = APR_OFFSETOF(unique_id_rec, counter);
-unique_id_rec_size[3] = sizeof(cur_unique_id.counter);
-unique_id_rec_offset[4] = APR_OFFSETOF(unique_id_rec, thread_index);
-unique_id_rec_size[4] = sizeof(cur_unique_id.thread_index);
+uni