It sounds like you have given this more thought than I have (for which I am 
grateful).

Still, my need is for a unique tag.  If you provide me with a unique tag which 
also encodes some useful information about source and time, and is guaranteed 
not to roll over in the face of a flood attack, and all the other good things 
you describe, then I'm still happy because you have met my basic need :-)

I do see your point about stripping incoming ids.  It seems to me that this 
should be a configurable option.  If we had several haproxys (haproxies?) 
stacked, I guess you would want to make the first one strip existing tags, and 
all the later ones keep them, since that would be the one added by the first 
one in the chain.


On Jan 27, 2011, at 5:44 PM, Willy Tarreau wrote:

> Hi Roy,
> 
> On Thu, Jan 27, 2011 at 02:51:37PM -0500, Roy Smith wrote:
>>> Try to think about these cases :
>>> - what to do with reqids that are already present in requests
>> 
>> I don't see how that would happen.  Maybe if we did something like stack one
>> haproxy behind another, but I don't see any reason we would do that.
> 
> Maybe you have a simple enough setup. I know places where you can pass through
> between 5 and 6 haproxies along a whole chain, simply because application
> components are chained and each stage includes a load balacing feature.
> 
>> But, if it were to happen somehow, I think we would just leave it untouched
>> (and log that we saw it).
> 
> That's just the most common thing to do for the inner instances, but the outer
> one needs an easy way to strip it, otherwise external users can inject the ID
> they want into your system.
> 
>> The goal is to have a unique identifier so that every process that's
>> involved in responding to a request can log the id, allowing us to correlate
>> logs. If the incoming request already has such an id, there's no reason to
>> change it.
> 
> Yes you have, see above ;-)
> 
>>> - what to do with reqids in responses
>>>   => compare them with the request's, block if they do not match
>>>   => delete them or not depending on where you're responding
>> 
>> I'm not sure I understand what you're asking.
> 
> Once you deploy unique IDs, it's common to seek for better application
> integration and have deeper application components return the ID they
> received in the responses. That way, the outer component can compare
> the ID it added with the ID it received in the response and ensure that
> there was no session crossing in the whole chain, as it unfortunately
> happens from time to time with buggy applications or components (mainly
> in threaded environments).
> 
>>> - what to log
>>>   => do we always need to log a full reqid or can we sometimes just
>>>      log one part of it
>> 
>> What do you mean by logging "one part" of an ID?  The ID is just a unique 
>> tag.  I don't understand how it can be divided into parts.
> 
> An ID can only be unique in a limited space * time. When you have multiple
> processes running on the same machine, a per-process counter is not enough
> anymore so you need to discriminate on the process too, otherwise you end
> up generating multiple identical "unique IDs". Then you add some machines
> and you repeat the same process. Then you take into account the risk of
> rollover of the values and you have to add a timestamp.
> 
> After about 10 years of feedback using unique IDs, I can say that some
> features are definitely important :
> 
>  - having a timestamp allows you to easily sort your events and correlate
>    them by time. It also indicates you where to look
> 
>  - having some origin information (whether it's the instance which received
>    the first event or the source address itself) helps a lot correlate logs
>    when some are missing. Logs are *always* missing when you want to
>    correlate large amounts. You discover that one FS got full or that one
>    syslog server was being restarted, or simply that you're dropping a few
>    of them on the wire or in system queues, etc... When you can identify
>    *where* the ID was created, you can reconstitute the missing parts of
>    the chain (assuming you're not missing too many, of course).
> 
>  - having some source information generally helps quickly search for other
>    occurrences of a similar suspicious event at places where its hard to
>    log source information. However, it's far from being a requirement, as
>    there are always alternatives. It's just that it help a lot.
> 
>  - having the ability to certify with good enough confidence that you're
>    not misinterpreting the IDs and that it's not possible that a different
>    event caused it. That's very important when you're bringing your logs
>    to authorities. You don't want to make someone go to jail for someone
>    else.
> 
> The minimum requirement I can identify for an haproxy-based ID to be unique
> would include :
>  - host ID (can be hostname)
>  - system PID
>  - timestamp
>  - counter
> 
> The counter must be large enough so as not to roll over within a single
> timestamp value. The host ID must be modulated by containers/zones/VMs/etc
> if any are present. That's why it's often easier to split it again in two
> parts, one being an environment ID or instance ID, which can be configured,
> and another one being the system's host name which can optionally be
> configured.
> 
> Systems I have been using involved source and destination too for
> convenience, but that's not absolutely needed, and they don't reduce
> much the minimum size of the counter.
> 
> Given that I have already managed to make a single instance process slightly
> more than 2 million requests per second (pipelined, and extremely short), it
> means that a 21-bit counter could be made to wrap around in one second in the
> context of an attack. Reasoning with future possibilities, we can easily see
> that 24 bits per second are not too much to support what could be done in a
> few years.
> 
> Some organizations need to keep logs for 3, 5 and sometimes 10 years (I'm not
> aware of more than that). 10 years is 315M seconds or 29 bits. So we need
> 29+24 bits split between timestamp and counter. Using two 32 bit entities,
> one with the unix time and one with a 32 bit counter is handy and makes sense.
> 
> The system pid has to cover both usages with nbproc > 1 (which could be done
> with a relative ID) and independant parallel tasks, which really require the
> system-global pid to discriminate them. While most systems are/were using
> 16-bit pids, things are evolving and 32-bit bits have been available for quite
> some time now. Since this part rarely changes, it might make sense to have the
> ability to configure its length. Also, probably that in a few years we'll
> support threads and will want to make a distinction between multiple threads
> of the same process (though it's also possible that having multiple threads
> share a same reqID generator could be OK if there aren't too many cores).
> Maybe we could use a system-global thread ID instead of the process ID too.
> 
> Even with that, we're already at 3*32 bits + system ID, so as you see, it's
> not just a simple counter even if the simple counter can fit some uses. And
> I'd rather have people spot other possible discriminators before we code than
> after. If we identify too many variables maybe we'll have to make the format
> user-configurable.
> 
> Regards,
> Willy
> 


--
Roy Smith
r...@panix.com






Reply via email to