Hi Ruei-Bang,

On Tue, Jan 09, 2024 at 07:18:14PM +0000, Ruei-Bang Chen wrote:
> Hi Willy,
> 
> Hope you are doing well in the New Year!

Yep, thank you!

> I just want to bump this thread so that we can continue the discussion.
> 
> I understand you probably have a lot of emails to catch up on recently so
> there is no urgency for this. Just want to make sure this thread does not get
> lost.

I've looked everywhere, both in my own mbox and the list archive and
couldn't find any trace of it, so I suspect that it possibly remained
only in your outbox, or was victim of a hiccup on the send side, so
thanks for resending!

Some comments below.

> ________________________________
> From: Ruei-Bang Chen <ruec...@linkedin.com>
> Sent: Tuesday, December 12, 2023 5:32 PM
> To: Willy Tarreau <w...@1wt.eu>
> Cc: haproxy@formilux.org <haproxy@formilux.org>
> Subject: Re: [PATCH] MEDIUM: sample: Modify fetchers for req.hdrs and 
> res.hdrs to selectively include / exclude headers
> 
> Hi Willy,
> 
> Sorry for the late reply. I know it has been some time since our last 
> discussion.
> I finally have a chance to follow up on this  after working on some other 
> tasks
> and the Thanksgiving break.
> 
> What I'm suspecting is that if
> users find it useful, it will not be long before some ask for a way to
> designate prefixes (e.g. select x-companyname-*, or exclude accept-* etc).
> The way you did it will make it easy to extend it for this, for example,
> by appending '*' to a header name, so that's fine. Hmmm well, '*' is
> actually permitted in header field names, so maybe another solution could
> be that we consider that we'd use prefixes by default and terminante names
> with ':' to designate full names. E.g. req.hdrs(accept:) would match only
> "accept" while req.hdrs(accept) would also match accept-encoding,
> accept-range etc. It's just up to us to decide (and you first since you're
> the first one to need this, so the ability to extend this and/or to factor
> arguments may be relevant to your use case.
> Yeah, I think that would be a possible ask / need from the client.
> Currently, we don't have the immediate need for this but I agree at some point
> this feature might come in handy even for us as well. I am thinking maybe
> it can be left as a separate follow-up patch after this one? Let me know if 
> you
> feel strongly that we should include this in the current patch.

Yes, I'm perfectly fine with a later follow-up. I mentioned this to make
sure that we think about it early, in case it requires to refine the syntax
*before* we corner ourselves at the wrong place, because once it's part
of a released version, you cannot change it in a breaking way anymore.
For example if we decide that an argument currently is an exact match,
it cannot later become a prefix. We still have some choice of other chars
to denote prefixes if needed, as listed in RFC9110 section 5.6.2 (header
field names are tokens):

  Tokens are short textual identifiers that do not include whitespace or
  delimiters.

  token          = 1*tchar
  tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                 / DIGIT / ALPHA
                 ; any VCHAR, except delimiters

  Many HTTP field values are defined using common syntax components, separated
  by whitespace or specific delimiting characters. Delimiters are chosen from
  the set of US-ASCII visual characters not allowed in a token (DQUOTE and
  "(),/:;<=>?@[\]{}").

Parenthesis and comma are already used to delimit the names in the
expression, but others like "@", "?" or "/" would not cause problems
if needed.

> Similarly I'm seeing that having the '!' in the first argument does add
> some special cases everywhere down the inner loop. Maybe just having a
> new name such as "req.hdrs_exc()" to enumerate exclusion would simplify
> everything, and possibly make it easier even for APIs which will feed
> these arguments, so as to remove the special case of the first argument.
> I don't know, but feel free to explore such possibilities, it's important
> to think about how configs will be used and updated. Very often the amount
> of work needed to make the core easy to use is less than the work needed
> externally to adapt to it ;-)
> This makes sense. After discussing within the team, we think it might be 
> easier
> to just add separate new fetchers like "req.hdrs_inc", "req.hdrs_exc", 
> "res.hdrs_inc",
> "res.hdrs_exc" to be clearer what each fetcher is doing. This way, we can 
> also make
> sure to not impact the existing flow for "req.hdrs" and "res.hdrs", wondering 
> what
> are your thoughts on this?

This could also be an idea, indeed. However, do you think about also adding
the _bin variant that builds a serialized binary block ? I think it's not
much a change, but if not done, it could be strange that only the
unrestricted list has a binary version and not the others. However I do
agree with you that it could simplify the function's implementation to have
distinct ones for all/inc/exc.

If we're purist, I think that hdrs() and hdrs_exc() would be exactly the
same, considering that hdrs_exc() would dump all headers except those
listed, so if none are listed, all are dumped. So conversely we could
also think that we could have hdrs_only() that contains the prefixes of
the only headers to list and that an empty list dumps nothing, and hdrs()
becoming an alias for hdrs_exc() that dumps all except the optional ones
listed.

In any case, those variants look pretty close to each other and I think
they're all fine and more or less a matter of taste and limited code & doc
duplication. If you or anyone else has a strong opinion in favor of one
choice or against one, I guess I'd be easy to convince either way ;-)

I think that it's more important to think forward about the syntax for
header selection so we can later support prefixes without breaking the
syntax.

Another possibility (just to give you some fuel) would be to consider
only hdrs() with the following rules:

  - by default all names are dumped
  - if an argument starts with "/", this name is excluded
  - if an argument starts with anything but "/", a match is sufficient
  - if an argument ends in "?", it's a prefix

That would allow in a single rule to perform some nice selection.
Example:

    hdrs()    => dumps all headers
    hdrs(/x-?) => all but x-anything
    hdrs(accept-?) => only anything starting with accept-
    hdrs(accept-encoding,accept-charset,user-agent): only these ones
    hdrs(/accept-charset,accept-?,user-agent): all accept* but accept-charset, 
and user-agent
    hdrs(/x-company-secret-?,x-company-?) => only x-company-* but not 
x-company-secret*

See ? It's not necessarily more difficult to implement, because for
each header found, you'd need to iterate over the args this way:
  - if starts with a /, and matches, skip
  - if does not start with a /, and matches, accept
  - past last arg, accept

That remains compatible with what we have (empty list dumps everything)
and finally summarizes in enumerating rules that dictate what to accept
or drop. 

I don't know if that helps nor if you like it, but it seems simpler to
me from a code and doc perspective and possibly more flexible thanks to
the ability to both include and exclude in the same rule (useful for
prefixes). And in this mode we can decide to start without prefixes if
that helps you, and just add them later using the "?" delimiter for
example.

I'm letting you explore and choose a solution around this, but I think
it could make sense this way.

Thanks,
Willy

Reply via email to