Hi Pierre,

On Mon, Aug 24, 2020 at 08:17:05AM +0000, Pierre Cheynier wrote:
> On Fri, Aug 21, 2020 at 8:11 PM William Dauchy <wdau...@gmail.com> wrote:
> 
> So awesome to get the first response from your direct colleague :)
> 
> > I believe this is expected; this behaviour has changed since v2.1 though.
> 
> Indeed, we don't use this logging variable since a long time, so I'm not 
> really able to confirm if this is so new.
> Anyway, I understand this is related to handling h2 and its specifics, still 
> I think there should be something to fix (in one way or the other) to get 
> back to a consistent/deterministic meaning of %HP (and maybe in other places 
> where this had an impact).
> 
> Willy, any thought about this?

What is certain is that I don't want to start logging a false or misleading
information. The issues stems from browsers using a different way to represent
a resource in a request depending on HTTP versions (and it's not their fault,
it's the recommended way to do it). In HTTP/1.x we used to have :
  - relative URIs made of a path only with a separate Host header
  - absolute URIs made of a scheme + host + path and the Host header
    having to be repeated.

In HTTP/2 this has been greatly simplified and only the second full URI
was kept by default. In order to save servers from having to parse these
elements, they are sent already split into:
  - a ":scheme" pseudo header holding the scheme of the URI
  - a ":authority" pseudo header holding the equivalent of the host part
    of the URI
  - a ":path" pseudo header holding the path part of the URI

And no Host header is needed anymore.

Thus an HTTP/2 request effectively "looks like" an HTTP/1 request using
an absolute URI. What causes the mess in the logs is that such HTTP/1
requests are rarely used (most only for proxies), but they are perfectly
valid and given that they are often used to take routing decisions, it's
mandatory that they are part of the logs. For example if you decide that
every *url* starting with "/img" has to be routed to the static server
and the rest to the application, you're forgetting that "https://foo/img/";
is valid as well and will be routed to the application. That's what I do
not want to report fake or reconstructed information in the logs.

In 1.8, what happened when we introduced H2 is that it was directly turned
into HTTP/1.1 before being processed and that given that we didn't support
server-side H2, the most seamless way to handle it was to just replace
everything with origin requests (no authority). That remained till 2.0
since it was not really acceptable to imagine that depending on whether
you enabled HTX or not you'd get different logs for the exact same request.
But now we cannot cheat anymore, it had caused too much trouble already

What I understand however is that it's possible that we need to rethink
what we're logging. Maybe instead of logging the URI by default (and missing
the Host in HTTP/1) we ought to instead log the scheme, authority and path
parts. These are not always there (scheme or authority in H1) but we can
log an empty field like "-" in this case.

We cannot realistically do that in the default "httplog" format, but we
can imagine a new default format which would report such info (htxlog?),
or maybe renaming httplog to http-legacy-log and changing the httplog's
default. We should then consider this opportunity to revisit certain
fields that do not make sense anymore, like the "~" in front of the
frontend's name for SSL, the various timers that need to report idle
and probably user-facing time instead of just data, etc.

There was something important I've been wanting for a few versions, which
was to have named log formats that we could declare in a central place and
use everywhere. It would tremendously help here. I know it can be done
using environment variables declared in the global section but I personally
find this ugly.

So I think it's the right place to open such a discussion (what we should
log and whether or not it loses info by default or requires to duplicate
some data while waiting for the response), so that we can reach a better
and more modern solution. I'm open to proposals.

Cheers,
Willy

Reply via email to