Hi Ruei-Bang, On Tue, Jan 09, 2024 at 07:18:14PM +0000, Ruei-Bang Chen wrote: > Hi Willy, > > Hope you are doing well in the New Year!
Yep, thank you! > I just want to bump this thread so that we can continue the discussion. > > I understand you probably have a lot of emails to catch up on recently so > there is no urgency for this. Just want to make sure this thread does not get > lost. I've looked everywhere, both in my own mbox and the list archive and couldn't find any trace of it, so I suspect that it possibly remained only in your outbox, or was victim of a hiccup on the send side, so thanks for resending! Some comments below. > ________________________________ > From: Ruei-Bang Chen <ruec...@linkedin.com> > Sent: Tuesday, December 12, 2023 5:32 PM > To: Willy Tarreau <w...@1wt.eu> > Cc: haproxy@formilux.org <haproxy@formilux.org> > Subject: Re: [PATCH] MEDIUM: sample: Modify fetchers for req.hdrs and > res.hdrs to selectively include / exclude headers > > Hi Willy, > > Sorry for the late reply. I know it has been some time since our last > discussion. > I finally have a chance to follow up on this after working on some other > tasks > and the Thanksgiving break. > > What I'm suspecting is that if > users find it useful, it will not be long before some ask for a way to > designate prefixes (e.g. select x-companyname-*, or exclude accept-* etc). > The way you did it will make it easy to extend it for this, for example, > by appending '*' to a header name, so that's fine. Hmmm well, '*' is > actually permitted in header field names, so maybe another solution could > be that we consider that we'd use prefixes by default and terminante names > with ':' to designate full names. E.g. req.hdrs(accept:) would match only > "accept" while req.hdrs(accept) would also match accept-encoding, > accept-range etc. It's just up to us to decide (and you first since you're > the first one to need this, so the ability to extend this and/or to factor > arguments may be relevant to your use case. > Yeah, I think that would be a possible ask / need from the client. > Currently, we don't have the immediate need for this but I agree at some point > this feature might come in handy even for us as well. I am thinking maybe > it can be left as a separate follow-up patch after this one? Let me know if > you > feel strongly that we should include this in the current patch. Yes, I'm perfectly fine with a later follow-up. I mentioned this to make sure that we think about it early, in case it requires to refine the syntax *before* we corner ourselves at the wrong place, because once it's part of a released version, you cannot change it in a breaking way anymore. For example if we decide that an argument currently is an exact match, it cannot later become a prefix. We still have some choice of other chars to denote prefixes if needed, as listed in RFC9110 section 5.6.2 (header field names are tokens): Tokens are short textual identifiers that do not include whitespace or delimiters. token = 1*tchar tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA ; any VCHAR, except delimiters Many HTTP field values are defined using common syntax components, separated by whitespace or specific delimiting characters. Delimiters are chosen from the set of US-ASCII visual characters not allowed in a token (DQUOTE and "(),/:;<=>?@[\]{}"). Parenthesis and comma are already used to delimit the names in the expression, but others like "@", "?" or "/" would not cause problems if needed. > Similarly I'm seeing that having the '!' in the first argument does add > some special cases everywhere down the inner loop. Maybe just having a > new name such as "req.hdrs_exc()" to enumerate exclusion would simplify > everything, and possibly make it easier even for APIs which will feed > these arguments, so as to remove the special case of the first argument. > I don't know, but feel free to explore such possibilities, it's important > to think about how configs will be used and updated. Very often the amount > of work needed to make the core easy to use is less than the work needed > externally to adapt to it ;-) > This makes sense. After discussing within the team, we think it might be > easier > to just add separate new fetchers like "req.hdrs_inc", "req.hdrs_exc", > "res.hdrs_inc", > "res.hdrs_exc" to be clearer what each fetcher is doing. This way, we can > also make > sure to not impact the existing flow for "req.hdrs" and "res.hdrs", wondering > what > are your thoughts on this? This could also be an idea, indeed. However, do you think about also adding the _bin variant that builds a serialized binary block ? I think it's not much a change, but if not done, it could be strange that only the unrestricted list has a binary version and not the others. However I do agree with you that it could simplify the function's implementation to have distinct ones for all/inc/exc. If we're purist, I think that hdrs() and hdrs_exc() would be exactly the same, considering that hdrs_exc() would dump all headers except those listed, so if none are listed, all are dumped. So conversely we could also think that we could have hdrs_only() that contains the prefixes of the only headers to list and that an empty list dumps nothing, and hdrs() becoming an alias for hdrs_exc() that dumps all except the optional ones listed. In any case, those variants look pretty close to each other and I think they're all fine and more or less a matter of taste and limited code & doc duplication. If you or anyone else has a strong opinion in favor of one choice or against one, I guess I'd be easy to convince either way ;-) I think that it's more important to think forward about the syntax for header selection so we can later support prefixes without breaking the syntax. Another possibility (just to give you some fuel) would be to consider only hdrs() with the following rules: - by default all names are dumped - if an argument starts with "/", this name is excluded - if an argument starts with anything but "/", a match is sufficient - if an argument ends in "?", it's a prefix That would allow in a single rule to perform some nice selection. Example: hdrs() => dumps all headers hdrs(/x-?) => all but x-anything hdrs(accept-?) => only anything starting with accept- hdrs(accept-encoding,accept-charset,user-agent): only these ones hdrs(/accept-charset,accept-?,user-agent): all accept* but accept-charset, and user-agent hdrs(/x-company-secret-?,x-company-?) => only x-company-* but not x-company-secret* See ? It's not necessarily more difficult to implement, because for each header found, you'd need to iterate over the args this way: - if starts with a /, and matches, skip - if does not start with a /, and matches, accept - past last arg, accept That remains compatible with what we have (empty list dumps everything) and finally summarizes in enumerating rules that dictate what to accept or drop. I don't know if that helps nor if you like it, but it seems simpler to me from a code and doc perspective and possibly more flexible thanks to the ability to both include and exclude in the same rule (useful for prefixes). And in this mode we can decide to start without prefixes if that helps you, and just add them later using the "?" delimiter for example. I'm letting you explore and choose a solution around this, but I think it could make sense this way. Thanks, Willy