Hi Alexander,

I'm starting from the doc as it eases the discussion.

On Thu, Oct 05, 2023 at 11:05:50AM +0000, Stephan, Alexander wrote:
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -16671,6 +16671,26 @@ proxy-v2-options <option>[,<option>]*
>                  generated unique ID is also used for the first HTTP request
>                  within a Keep-Alive connection.
> 
> +  Besides, you can specify the type of TLV numerically instead.
> +
> +  Example 1:
> +  server example 127.0.0.1:2319 send-proxy-v2 proxy-v2-options crc32c,0xEE
> +
> +  The following logic applies: By default, the remote frontend connection is
> +  searched for the value 0xEE. If found, it is simply forwarded. Otherwise,
> +  a TLV with the ID "0xEE" and empty payload is sent out. In addition, crc32c
> +  is also set here.
> +
> +  You can also specify a key-value pair <id>=<fmt> syntax. <id> represents 
> the
> +  8 bit TLV type field in the range of 0 to 255. It can be expressed in 
> decimal
> +  or hexadecimal format (prefixed by "0x").
> +
> +  Example 2:
> +  server example_server 127.0.0.1:2319 send-proxy-v2 proxy-v2-options 
> 0xEE=%[str("foo")]
> +
> +  This will always send out the value "foo". Another common use case would 
> be to
> +  reference a variable.

I'm really having an issue with using "the remote frontend connection"
here. We've done this mistake in the past with the transparent mode that
connects to the original destination address, that resulted in "set-dst"
having to be implemented, then being broken by multiplexed streams (e.g.
h2), then having to be internally emulated at various layers (connection,
session, stream, transaction etc). Same for src. The solution is not
durable at all, and as you noticed, you already had to implement the
ability to modify these fields before passing them, hence reintroducing
that class of trouble on a new area. Also what will be left in the logs
after you modify the contents ? etc.

Given that you had previously contributed the ability to fetch a TLV
field from the front connection's proxy protocol using fc_pp_tlv(),
I'd rather make use of these ones explicitly. If you need to change
them, then you just assign them to a variable and edit that variable,
then send the variable. Or you can change them in-flight using a
converter. Of course it would make configs slightly more verbose, but
they would always do the right thing and we would not have to continue
to special-case them over time because they break expectations like
set-src/set-dst did.

I'm seeing another issue with passing type=fmt like this. The option is
currently defined as cutting around commas, and that's what it does. This
means that the format will not be parsable (no sample fetch functions with
more than one argument, no converters). Example:

   proxy-v2-options 0xEE=%[var(txn.clientname,_unknown_)]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
                    First option              Second option

Also the syntax in the format something=value is not that great for config
parsers and generators. However it looks like our server keywords support
parenthesis, so I think that instead you could do the following:

   proxy-v2-options(0xEE) %[var(txn.clientname,_unknown_)]

In this case the proxy-v2-options keyword would only add one option and
take a log-format, and you'd add as many such keywords for each option.
Note that from a configuration perspective this is cleaner as it's easier
to maintain value pairs like this for config management. But I don't think
the option name is that great especially to send a single one. So maybe
this one instead should be "proxy-v2-option(type) <fmt>" though it may
maintain some confusion (option vs options). The other solution might be
to have a shorter one, which is also nicer when using many of these, e.g.
"proxy-v2-opt()".

Such an approach would basically split the proxy-v2 usage in two groups:
  - options that are either there or not there (proxy-v2-options)
  - options that need to take a value (proxy-v2-opt()).

I've been wondering if we want to use a format string or raw binary data
here, because some options are binary and nothing indicates that this will
not continue in the future. However since fc_pp_tlv() returns a string, I
guess most of them will continue this way. For set-var() we're using a raw
typed expression, but then we created set-var-fmt() to have one that takes
a log format. Just thinking loud, but then what about "proxy-v2-optfmt()"
or even "proxy-v2-tlvfmt()" or "pp-tlv-fmt()" (after all when speaking
about tlv we imply v2, right?). That would allow us later to implement
"pp-tlv()" taking an expression if really needed, e.g. to pass a DER
certificate, a SHA256, a sub-tlv or any such a thing.

Generally speaking I'm fine with the approach of making the output PP TLV
configurable, I think it does bring some value and flexibility in general.
That's also why I'm extremely careful about the config shortcomings that
can come with it, because I suspect it could become popular and we really
want to avoid new fall traps.

I think that most of your patch will remain as-is, it's only the parsing
part which needs to change a little bit to take such points into account.
However as you understand I really don't want to modify existing values
in-place but since variables and converters can do the job, that's likely
the best approach.

Also once this is merged, a new problem will emerge. There's still no
registry for the PPv2 TLV types. Years ago I've been suggested to write
an RFC about it but I just can't take two more hours of my daily sleep
time to work on that. I'm seeing that the proxy-protocol doc already
reserves a two ranges (one for locally-administered values, one for
experiment). I think that the doc for the keyword should put a big
caution about this, suggesting to have a look at that section and not
to pick random values out of nowhere.

Willy

Reply via email to