Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-09-11 Thread Willy Tarreau
Hi Alexander,

On Mon, Sep 11, 2023 at 03:44:16PM +, Stephan, Alexander wrote:
> Hi Willy and Ilya,
> 
> Sorry for the absence, I was mostly out-of-office the last week. I am really
> sorry for causing this bug.

No worries, stuff like this happens, that's why we have regtests, CI,
this development cycle, and we run all development versions in production
on haproxy.org. I could have caught it as well when merging your patches
and didn't either.

> Thanks so much Willy for fixing it as well as for the pool inspection feature
> I saw today, that looks really handy.

Yep, actually I figured it could help a lot in some contexts like CI
where we don't have the cores, or even some issue reports where we
can't reasonably ask the user to send the core nor type complex commands.
And I thought that this example was a perfect one since it triggered under
vtest, so I rolled back to the commit, applied my patches on top of it and
tried again.

> I actually debugged the code for quite some time as well, I also tried
> analyzers but sadly was not able to find it.

That's the problem with such bugs that trigger at exactly one size. You'd
have needed a fuzzer, and even then... Too much time needed to fall by
pure luck on the problem. CI and prod are much more efficient ;-)

> While looking through the CI history, I found
> https://github.com/haproxy/haproxy/runs/16184951880, also related to TLVs and
> only failing on BSD.
> I guess, this is actual flakiness then? I didn't find any related bug fix yet.

I think so because locally I'm seeing random tests fail on FreeBSD and
I don't know why. We don't monitor this one often and already found it
broken in the past. There might be a bug hitting this platform, or some
flakiness of vtest there, I don't know for now. We'll figure it before
the release anyway :-)

Cheers,
Willy



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-09-11 Thread Stephan, Alexander
Hi Willy and Ilya,

Sorry for the absence, I was mostly out-of-office the last week. I am really 
sorry for causing this bug.
Thanks so much Willy for fixing it as well as for the pool inspection feature I 
saw today, that looks really handy.
I actually debugged the code for quite some time as well, I also tried 
analyzers but sadly was not able to find it.

While looking through the CI history, I found 
https://github.com/haproxy/haproxy/runs/16184951880, also related to TLVs and 
only failing on BSD.
I guess, this is actual flakiness then? I didn’t find any related bug fix yet.

Best,
Alexander


From: Илья Шипицин 
Sent: Thursday, August 31, 2023 8:56 PM
To: Willy Tarreau 
Cc: Stephan, Alexander ; haproxy@formilux.org
Subject: Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY 
protocol v2 TLV values

You don't often get email from 
chipits...@gmail.com<mailto:chipits...@gmail.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
cirrus-ci backtrace

freebsd (cirrus-ci) crash · Issue #2275 · haproxy/haproxy 
(github.com)<https://github.com/haproxy/haproxy/issues/2275>

as usual, I'll send CI improvements once polished

чт, 31 авг. 2023 г. в 18:22, Илья Шипицин 
mailto:chipits...@gmail.com>>:
while trying to enable "gdb bt" on cirrus-ci, I noticed that we have similar 
crashes on musl (where gdb implemented already)

https://github.com/haproxy/haproxy/issues/2274

ср, 30 авг. 2023 г. в 05:29, Willy Tarreau mailto:w...@1wt.eu>>:
On Tue, Aug 29, 2023 at 11:16:32PM +0200,  ??? wrote:
> ??, 29 ???. 2023 ?. ? 16:45, Willy Tarreau mailto:w...@1wt.eu>>:
>
> > On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
> > > On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> > > > However, I noticed there is a problem now with the FreeBSD test. Have
> > you
> > > > already looked into it?
> > >
> > > Ah no, I had not noticed. I first pushed into a temporary branch and
> > > everything was OK so I pushed into master again without checking since
> > > it was the exact same commit.
> > >
> > > > I was not able to reproduce it for now. Looks like the test passes but
> > it
> > > > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your
> > branch
> > > > looked good iirc.
> > >
> > > Hmmm no, signal 4 is SIGILL, we're using it for our asserts (BUG_ON()).
> > > I never know how to retrieve the stderr log from these tests, there are
> > > too many layers for me, I've been explained several times but cannot
> > > memorize it.
> > >
> > > I'm rebuilding here on a freebsd machine, hoping to see it happen again.
> > >
> > > Note that we cannot rule out a transient issue such as a temporary memory
> > > shortage or too long a CPU pause as such VMs are usually overloaded.
> > > Maybe it has nothing to do with your new test but just randomly triggered
> > > there, or maybe it put the light on a corner case. At least it's not a
> > > segfault or such a crash that would indicate a pointer misuse.
> >
> > I'm getting totally random results from vtest on FreeBSD, I don't know
> > why. I even tried to limit to one parallel process and am still getting
> > on average 2 errors per run, most always on SSL but not only. Some of
> > them include connection failures (sC) with 503 being returned and showing
> > "Timeout during SSL handshake". I'm not getting the SIGILL though. Thus
> > I don't know what to think about it, especially since it previously worked.
> > We'll see if it happens again on next push.
> >
> > Willy
> >
>
> we can disable cirrus-ci, not sure what is the purpose of randomly failing
> CI :)

I think it's still used for arm builds or something like this. Regardless,
given that freebsd randomly fails on my local platform,there might be
something else. I remember that cirrus used to fail some time ago, I
don't know what its status is right now.

Willy


Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-31 Thread Илья Шипицин
cirrus-ci backtrace

freebsd (cirrus-ci) crash · Issue #2275 · haproxy/haproxy (github.com)


as usual, I'll send CI improvements once polished

чт, 31 авг. 2023 г. в 18:22, Илья Шипицин :

> while trying to enable "gdb bt" on cirrus-ci, I noticed that we have
> similar crashes on musl (where gdb implemented already)
>
> https://github.com/haproxy/haproxy/issues/2274
>
> ср, 30 авг. 2023 г. в 05:29, Willy Tarreau :
>
>> On Tue, Aug 29, 2023 at 11:16:32PM +0200,  ??? wrote:
>> > ??, 29 ???. 2023 ?. ? 16:45, Willy Tarreau :
>> >
>> > > On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
>> > > > On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
>> > > > > However, I noticed there is a problem now with the FreeBSD test.
>> Have
>> > > you
>> > > > > already looked into it?
>> > > >
>> > > > Ah no, I had not noticed. I first pushed into a temporary branch and
>> > > > everything was OK so I pushed into master again without checking
>> since
>> > > > it was the exact same commit.
>> > > >
>> > > > > I was not able to reproduce it for now. Looks like the test
>> passes but
>> > > it
>> > > > > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on
>> your
>> > > branch
>> > > > > looked good iirc.
>> > > >
>> > > > Hmmm no, signal 4 is SIGILL, we're using it for our asserts
>> (BUG_ON()).
>> > > > I never know how to retrieve the stderr log from these tests, there
>> are
>> > > > too many layers for me, I've been explained several times but cannot
>> > > > memorize it.
>> > > >
>> > > > I'm rebuilding here on a freebsd machine, hoping to see it happen
>> again.
>> > > >
>> > > > Note that we cannot rule out a transient issue such as a temporary
>> memory
>> > > > shortage or too long a CPU pause as such VMs are usually overloaded.
>> > > > Maybe it has nothing to do with your new test but just randomly
>> triggered
>> > > > there, or maybe it put the light on a corner case. At least it's
>> not a
>> > > > segfault or such a crash that would indicate a pointer misuse.
>> > >
>> > > I'm getting totally random results from vtest on FreeBSD, I don't know
>> > > why. I even tried to limit to one parallel process and am still
>> getting
>> > > on average 2 errors per run, most always on SSL but not only. Some of
>> > > them include connection failures (sC) with 503 being returned and
>> showing
>> > > "Timeout during SSL handshake". I'm not getting the SIGILL though.
>> Thus
>> > > I don't know what to think about it, especially since it previously
>> worked.
>> > > We'll see if it happens again on next push.
>> > >
>> > > Willy
>> > >
>> >
>> > we can disable cirrus-ci, not sure what is the purpose of randomly
>> failing
>> > CI :)
>>
>> I think it's still used for arm builds or something like this. Regardless,
>> given that freebsd randomly fails on my local platform,there might be
>> something else. I remember that cirrus used to fail some time ago, I
>> don't know what its status is right now.
>>
>> Willy
>>
>


Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-31 Thread Илья Шипицин
while trying to enable "gdb bt" on cirrus-ci, I noticed that we have
similar crashes on musl (where gdb implemented already)

https://github.com/haproxy/haproxy/issues/2274

ср, 30 авг. 2023 г. в 05:29, Willy Tarreau :

> On Tue, Aug 29, 2023 at 11:16:32PM +0200,  ??? wrote:
> > ??, 29 ???. 2023 ?. ? 16:45, Willy Tarreau :
> >
> > > On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
> > > > On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> > > > > However, I noticed there is a problem now with the FreeBSD test.
> Have
> > > you
> > > > > already looked into it?
> > > >
> > > > Ah no, I had not noticed. I first pushed into a temporary branch and
> > > > everything was OK so I pushed into master again without checking
> since
> > > > it was the exact same commit.
> > > >
> > > > > I was not able to reproduce it for now. Looks like the test passes
> but
> > > it
> > > > > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your
> > > branch
> > > > > looked good iirc.
> > > >
> > > > Hmmm no, signal 4 is SIGILL, we're using it for our asserts
> (BUG_ON()).
> > > > I never know how to retrieve the stderr log from these tests, there
> are
> > > > too many layers for me, I've been explained several times but cannot
> > > > memorize it.
> > > >
> > > > I'm rebuilding here on a freebsd machine, hoping to see it happen
> again.
> > > >
> > > > Note that we cannot rule out a transient issue such as a temporary
> memory
> > > > shortage or too long a CPU pause as such VMs are usually overloaded.
> > > > Maybe it has nothing to do with your new test but just randomly
> triggered
> > > > there, or maybe it put the light on a corner case. At least it's not
> a
> > > > segfault or such a crash that would indicate a pointer misuse.
> > >
> > > I'm getting totally random results from vtest on FreeBSD, I don't know
> > > why. I even tried to limit to one parallel process and am still getting
> > > on average 2 errors per run, most always on SSL but not only. Some of
> > > them include connection failures (sC) with 503 being returned and
> showing
> > > "Timeout during SSL handshake". I'm not getting the SIGILL though. Thus
> > > I don't know what to think about it, especially since it previously
> worked.
> > > We'll see if it happens again on next push.
> > >
> > > Willy
> > >
> >
> > we can disable cirrus-ci, not sure what is the purpose of randomly
> failing
> > CI :)
>
> I think it's still used for arm builds or something like this. Regardless,
> given that freebsd randomly fails on my local platform,there might be
> something else. I remember that cirrus used to fail some time ago, I
> don't know what its status is right now.
>
> Willy
>


Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Willy Tarreau
On Tue, Aug 29, 2023 at 11:16:32PM +0200,  ??? wrote:
> ??, 29 ???. 2023 ?. ? 16:45, Willy Tarreau :
> 
> > On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
> > > On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> > > > However, I noticed there is a problem now with the FreeBSD test. Have
> > you
> > > > already looked into it?
> > >
> > > Ah no, I had not noticed. I first pushed into a temporary branch and
> > > everything was OK so I pushed into master again without checking since
> > > it was the exact same commit.
> > >
> > > > I was not able to reproduce it for now. Looks like the test passes but
> > it
> > > > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your
> > branch
> > > > looked good iirc.
> > >
> > > Hmmm no, signal 4 is SIGILL, we're using it for our asserts (BUG_ON()).
> > > I never know how to retrieve the stderr log from these tests, there are
> > > too many layers for me, I've been explained several times but cannot
> > > memorize it.
> > >
> > > I'm rebuilding here on a freebsd machine, hoping to see it happen again.
> > >
> > > Note that we cannot rule out a transient issue such as a temporary memory
> > > shortage or too long a CPU pause as such VMs are usually overloaded.
> > > Maybe it has nothing to do with your new test but just randomly triggered
> > > there, or maybe it put the light on a corner case. At least it's not a
> > > segfault or such a crash that would indicate a pointer misuse.
> >
> > I'm getting totally random results from vtest on FreeBSD, I don't know
> > why. I even tried to limit to one parallel process and am still getting
> > on average 2 errors per run, most always on SSL but not only. Some of
> > them include connection failures (sC) with 503 being returned and showing
> > "Timeout during SSL handshake". I'm not getting the SIGILL though. Thus
> > I don't know what to think about it, especially since it previously worked.
> > We'll see if it happens again on next push.
> >
> > Willy
> >
> 
> we can disable cirrus-ci, not sure what is the purpose of randomly failing
> CI :)

I think it's still used for arm builds or something like this. Regardless,
given that freebsd randomly fails on my local platform,there might be
something else. I remember that cirrus used to fail some time ago, I
don't know what its status is right now.

Willy



Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Илья Шипицин
вт, 29 авг. 2023 г. в 16:45, Willy Tarreau :

> On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
> > On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> > > However, I noticed there is a problem now with the FreeBSD test. Have
> you
> > > already looked into it?
> >
> > Ah no, I had not noticed. I first pushed into a temporary branch and
> > everything was OK so I pushed into master again without checking since
> > it was the exact same commit.
> >
> > > I was not able to reproduce it for now. Looks like the test passes but
> it
> > > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your
> branch
> > > looked good iirc.
> >
> > Hmmm no, signal 4 is SIGILL, we're using it for our asserts (BUG_ON()).
> > I never know how to retrieve the stderr log from these tests, there are
> > too many layers for me, I've been explained several times but cannot
> > memorize it.
> >
> > I'm rebuilding here on a freebsd machine, hoping to see it happen again.
> >
> > Note that we cannot rule out a transient issue such as a temporary memory
> > shortage or too long a CPU pause as such VMs are usually overloaded.
> > Maybe it has nothing to do with your new test but just randomly triggered
> > there, or maybe it put the light on a corner case. At least it's not a
> > segfault or such a crash that would indicate a pointer misuse.
>
> I'm getting totally random results from vtest on FreeBSD, I don't know
> why. I even tried to limit to one parallel process and am still getting
> on average 2 errors per run, most always on SSL but not only. Some of
> them include connection failures (sC) with 503 being returned and showing
> "Timeout during SSL handshake". I'm not getting the SIGILL though. Thus
> I don't know what to think about it, especially since it previously worked.
> We'll see if it happens again on next push.
>
> Willy
>

we can disable cirrus-ci, not sure what is the purpose of randomly failing
CI :)


Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Willy Tarreau
On Tue, Aug 29, 2023 at 04:31:31PM +0200, Willy Tarreau wrote:
> On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> > However, I noticed there is a problem now with the FreeBSD test. Have you
> > already looked into it?
> 
> Ah no, I had not noticed. I first pushed into a temporary branch and
> everything was OK so I pushed into master again without checking since
> it was the exact same commit.
> 
> > I was not able to reproduce it for now. Looks like the test passes but it
> > crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your branch
> > looked good iirc. 
> 
> Hmmm no, signal 4 is SIGILL, we're using it for our asserts (BUG_ON()).
> I never know how to retrieve the stderr log from these tests, there are
> too many layers for me, I've been explained several times but cannot
> memorize it.
> 
> I'm rebuilding here on a freebsd machine, hoping to see it happen again.
> 
> Note that we cannot rule out a transient issue such as a temporary memory
> shortage or too long a CPU pause as such VMs are usually overloaded.
> Maybe it has nothing to do with your new test but just randomly triggered
> there, or maybe it put the light on a corner case. At least it's not a
> segfault or such a crash that would indicate a pointer misuse.

I'm getting totally random results from vtest on FreeBSD, I don't know
why. I even tried to limit to one parallel process and am still getting
on average 2 errors per run, most always on SSL but not only. Some of
them include connection failures (sC) with 503 being returned and showing
"Timeout during SSL handshake". I'm not getting the SIGILL though. Thus
I don't know what to think about it, especially since it previously worked.
We'll see if it happens again on next push.

Willy



Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Willy Tarreau
On Tue, Aug 29, 2023 at 02:16:55PM +, Stephan, Alexander wrote:
> However, I noticed there is a problem now with the FreeBSD test. Have you
> already looked into it?

Ah no, I had not noticed. I first pushed into a temporary branch and
everything was OK so I pushed into master again without checking since
it was the exact same commit.

> I was not able to reproduce it for now. Looks like the test passes but it
> crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your branch
> looked good iirc. 

Hmmm no, signal 4 is SIGILL, we're using it for our asserts (BUG_ON()).
I never know how to retrieve the stderr log from these tests, there are
too many layers for me, I've been explained several times but cannot
memorize it.

I'm rebuilding here on a freebsd machine, hoping to see it happen again.

Note that we cannot rule out a transient issue such as a temporary memory
shortage or too long a CPU pause as such VMs are usually overloaded.
Maybe it has nothing to do with your new test but just randomly triggered
there, or maybe it put the light on a corner case. At least it's not a
segfault or such a crash that would indicate a pointer misuse.

Willy



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Stephan, Alexander
Hi Willy,

> And I was wrong, they were indeed for the first one. However I had to also 
> remove the NOT_LAST from the intermediate patches using the list_for_each().
> I put quotes around the symbolic names in the doc to make it  clearer which 
> one was to be used and which one it corresponds to, as I had a moment of 
> hesitation when reading it the first time, so I profitted from this first 
> encounter to try to further limit doubts.
Thanks for the adjustments and clarifications. 

> That's now merged, thanks again!
Very happy that is merged now.

However, I noticed there is a problem now with the FreeBSD test. Have you 
already looked into it?
I was not able to reproduce it for now. Looks like the test passes but it 
crashes afterwards? Maybe some SEGFAULT. Not sure... the CI on your branch 
looked good iirc. 

Best,
Alexander



Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Willy Tarreau
On Tue, Aug 29, 2023 at 03:15:48PM +0200, Willy Tarreau wrote:
> Overall yes. There are just two small parts in the first patch that are
> for the immediately following patches ("refactor...") that I'm going to
> move there.

And I was wrong, they were indeed for the first one. However I had to
also remove the NOT_LAST from the intermediate patches using the
list_for_each(). I put quotes around the symbolic names in the doc to
make it clearer which one was to be used and which one it corresponds
to, as I had a moment of hesitation when reading it the first time, so
I profitted from this first encounter to try to further limit doubts.

> I'm doing that and merging it, or I'll get back to you if I have more
> questions. Thank you!

That's now merged, thanks again!
Willy



Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-29 Thread Willy Tarreau
Hi Alexander,

On Mon, Aug 28, 2023 at 12:38:45PM +, Stephan, Alexander wrote:
> > I'm fine with this, however I find that the doc is not very clear about
> > what is permitted
> I agree that doc needs some more details. I added the note about the
> iterations and described all the symbolic constants, see at the end of this
> mail.

Thanks.

> > Unless you're having any objection, I'm going to flip type and len below:
> Sorry, that was an oversight on my end. Sure, go ahead. I actually thought I
> would have arranged it that way, but it somehow slipped my view.

No worries, that's what reviews are made for, isn't it ? ;-)

> > In this case it could be clarified in the doc that the sample fetch
> > functions for authority/uniqueid only return the first of each, and that
> > fc_pp_tlv() iterates over all occurrences of the requested ID. This would
> > then place a clear separation between trying to extract THE authority, or
> > checking ALL TLVs of type AUTHORITY.
> 
> > What do you think ?
> Good idea, agreed. That also aligns with my goals to keep the existing
> behavior as much as possible.

OK great.

> > If you're OK with this, I'd appreciate it if you could send a few informal
> > incremental patches that I'd squash into yours.
> I'm OK with this. :-) The first patch (0001) should be squashed into the
> commit that introduces the fetch fc_pp_tlv.
> I guess it fits there because it actually retains and documents prior
> behavior. The behavior regarding duplicates is also already present,
> therefore I already added corresponding docs there.
> The second one  (0002) could then be applied to the last patch with the
> introduction of the constants. Would that work for you?

Overall yes. There are just two small parts in the first patch that are
for the immediately following patches ("refactor...") that I'm going to
move there.

I'm doing that and merging it, or I'll get back to you if I have more
questions. Thank you!

Willy



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-28 Thread Stephan, Alexander
Hi Willy,

> finally back to this! Overall it's a great and very clean series, I really 
> want to thank you for this high quality work!
Thanks for the compliment, really glad to hear! :)

> Yeah it initially gave me a bit of head scratching when reading this part but 
> I understood what you did and find that it pretty much makes sense after all.
> It's not necessarily a hack given that what you've done consists in passing a 
> pre-parsed argument internally. It's not much different from the few cases 
> where we emulate older functions or actions using new mechanisms, building 
> some config elements on the fly just to parse them.
Okay, great if there is already similar code. Hack was a bit of an 
overstatement by me, at least if works a bit different than usually.

> I'm fine with this, however I find that the doc is not very clear about what 
> is permitted
I agree that doc needs some more details. I added the note about the iterations 
and described all the symbolic constants, see at the end of this mail.

> Unless you're having any objection, I'm going to flip type and len below:
Sorry, that was an oversight on my end. Sure, go ahead. I actually thought I 
would have arranged it that way, but it somehow slipped my view.

> I tend to *prefer* to have them separately for long series that take a lot of 
> time to review and cannot be done at once. But this series could be reviewed 
> at once, most patches remain small enough so that's no big deal.
OK, great. But I try to keep it in mind for future work anyway.

> In this case it could be clarified in the doc that the sample fetch functions 
> for authority/uniqueid only return the first of each, and that
> fc_pp_tlv() iterates over all occurrences of the requested ID. This would 
> then place a clear separation between trying to extract THE authority, or 
> checking ALL TLVs of type AUTHORITY.

> What do you think ?
Good idea, agreed. That also aligns with my goals to keep the existing behavior 
as much as possible.

> If you're OK with this, I'd appreciate it if you could send a few informal 
> incremental patches that I'd squash into yours.
I'm OK with this. :-) The first patch (0001) should be squashed into the commit 
that introduces the fetch fc_pp_tlv.
I guess it fits there because it actually retains and documents prior behavior. 
The behavior regarding duplicates is also already present,
therefore I already added corresponding docs there.
The second one  (0002) could then be applied to the last patch with the 
introduction of the constants. Would that work for you?

Best,
Alexander


0002-Extend-docs-of-fc_pp_tlv-with-constants.patch
Description: 0002-Extend-docs-of-fc_pp_tlv-with-constants.patch


0001-Restrict-fc_pp_authority-and-fc_pp_unique_id-to-firs.patch
Description:  0001-Restrict-fc_pp_authority-and-fc_pp_unique_id-to-firs.patch


Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-28 Thread Willy Tarreau
Hi Alexander,

finally back to this! Overall it's a great and very clean series, I really
want to thank you for this high quality work!

On Wed, Aug 16, 2023 at 04:24:21PM +, Stephan, Alexander wrote:
> I was not able to use a check function for authority and unique_id without
> modifying sample.c or allowing an argument. As far as I can tell, the check
> function is only executed for sample fetches that have not specified 0 for
> the arg function. I think it's okay this way since, semantically, those
> function do not handle arguments, so doing the argument setup internally
> makes sense IMO.

Yeah it initially gave me a bit of head scratching when reading this part
but I understood what you did and find that it pretty much makes sense
after all.

> If there is way to adjust this without any hacks, I will of course apply it.

It's not necessarily a hack given that what you've done consists in passing
a pre-parsed argument internally. It's not much different from the few cases
where we emulate older functions or actions using new mechanisms, building
some config elements on the fly just to parse them.

> For now, I have appended them to this message to not cause to much spam on
> the list while also keeping the previous, related discussion.
> If I should send them individually, please tell me and I will do so.

I tend to *prefer* to have them separately for long series that take a lot
of time to review and cannot be done at once. But this series could be
reviewed at once, most patches remain small enough so that's no big deal.

> Restructuring, especially in regard to pooling lead to quite a bit of
> changes, but the overall logic still applies and should hopefully be
> relatively straight forward.
> Actually, the code even got simpler in many places! :)

I noticed as well. I think I understood it all, which is a good point.

> BTW, I also added some TLV type constants in the 6th patch, feel free to
> merge it or ignore it. I think it helps with readability and is not really
> risky.

I'm fine with this, however I find that the doc is not very clear about
what is permitted:

 fc_pp_tlv() : string
   Returns the TLV value for the given TLV ID or type constant sent by the
   client in the PROXY protocol header, if any. TLV constants correspond
   to their type suffix as specified in the PPv2 spec, e.g., AUTHORITY.

It's not clear to me as a user that this  argument supports a
numeric value, nor what is the allowed range (I don't even know
personally). And given that you added a number of special keywords,
it's important that those recognized by the code are also mentioned
here. I would suggest something around:

 ... the given TLV ID which must either be a numeric value between
 XXX and YYY included, or one of the supported following symbolic
 names that correspond to the TLV constant suffixes in the PPv2 spec:
 AUTHORITY:x, ...

I would appreciate it if you can just reword this part with the ranges,
names and values according to what you know about it, I'll happily
apply it directly into your last patch.

> If there should be a nit that you quickly want to change, feel free to. I am
> not upset about it at all.

Unless you're having any objection, I'm going to flip type and len below:

 struct conn_tlv_list {
struct list list;
unsigned char type;
unsigned short len; // 65535 should be more than enough!
char value[0];
 } __attribute__((packed));

These result in  being unaligned, which is less clean on some archs,
and would also make the compiler complain if a pointer to len is ever
passed to a function taking a short*.

Finally, I noticed a behaviour change and I don't know if it's intentional
or not, but I think it needs to be discussed and possibly also documented.
Prior to your patchset, the authority and uniqueid were unique (I don't
know which of the first or last was kept, and I haven't checked but it
could even be possible that in case of duplicate it results in a memory
leak). But at least a single one was used. Now since you're setting
NOT_LAST on the sample, multiple such samples can be checked, e.g. in an
ACL that comes back and retrieves the next TLV of the same type when looking
for something. While I find this good, I do not think it is a good idea to
support that for important stuff like AUTHORITY or UNIQUEID: if an upstream
equipment can be fooled into sending more than one of each, the behavior
can be particularly confusing and may even be abused. I think this could
be addressed like this:

  /* fetch the authority TLV from a PROXY protocol header */
  int smp_fetch_fc_pp_authority(const struct arg *args, struct sample *smp, 
const char *kw, void *private)
  {
struct arg tlv_arg;
int ret;

set_tlv_arg(PP2_TYPE_AUTHORITY, &tlv_arg);
ret = smp_fetch_fc_pp_tlv(&tlv_arg, smp, kw, private);
smp->flags &= ~SMP_F_NOT_LAST; // return only the first authority
return ret;
  }

In this case it could be 

Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-25 Thread Willy Tarreau
Hi Alexander,

On Fri, Aug 25, 2023 at 09:34:08AM +, Stephan, Alexander wrote:
> Hi Willy,
> 
> Did you already have the chance to take a look at the updated patches?
> No hurry though, I just wanted to make sure that the message didn't get lost.

Not yet, I'm still burried under annoying bugs for now. I was just telling
a coworker that I intend to review your work early next week. I'm fairly
confident but it requires time.

Thanks!
Willy



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-25 Thread Stephan, Alexander
Hi Willy,

Did you already have the chance to take a look at the updated patches?
No hurry though, I just wanted to make sure that the message didn't get lost.

As mentioned, I am aware that sending individual patches is better in the 
common case.
If that is a problem here, please just let me know.

Best,
Alexander



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-16 Thread Stephan, Alexander
Hi Willy,

Thanks for the clarifications. I've implemented your requested changes and 
split my changes in 6 consecutive patches.

I was not able to use a check function for authority and unique_id without 
modifying sample.c or allowing an argument.
As far as I can tell, the check function is only executed for sample fetches 
that have not specified 0 for the arg function.
I think it's okay this way since, semantically, those function do not handle 
arguments, so doing the argument setup internally makes sense IMO.
If there is way to adjust this without any hacks, I will of course apply it.

For now, I have appended them to this message to not cause to much spam on the 
list while also keeping the previous, related discussion.
If I should send them individually, please tell me and I will do so.

Restructuring, especially in regard to pooling lead to quite a bit of changes, 
but the overall logic still applies and should hopefully be relatively straight 
forward.
Actually, the code even got simpler in many places! :)

BTW, I also added some TLV type constants in the 6th patch, feel free to merge 
it or ignore it.
I think it helps with readability and is not really risky.

If there should be a nit that you quickly want to change, feel free to. I am 
not upset about it at all.

Best,
Alexander

-Original Message-
From: Willy Tarreau  
Sent: Sunday, August 13, 2023 10:01 AM
To: Stephan, Alexander 
Cc: haproxy@formilux.org
Subject: Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY 
protocol v2 TLV values

[You don't often get email from w...@1wt.eu. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Alexander,

On Fri, Aug 11, 2023 at 02:08:37PM +, Stephan, Alexander wrote:
> Hi Willy,
>
> Thanks for the nice, detailed feedback.
> Overall, I agree with all of your listed points, so no need for further 
> discussions. ?
> I will hopefully send the separated patches at the beginning of next week.

OK! No rush anyway, such changes having a low impact can be merged late in the 
cycle if needed, so take your time.

> Just a comment and two small questions to make sure I got things correct:
>
> > As such I'd like them to be renamed to remove this confusion.
> > Maybe just call them HA_PP2_* ?
>
> Yes, such a prefix will clean it up quite nicely IMO. Will add that to 
> the first patch of the series.

Thanks.

> > [...]
> > It may even encourage us to create
> > smaller pools if ever deemed really useful (e.g. a 4- or 8- byte 
> > load balancing hash key for end-to-end consistency would only take a 
> > total of 32 or 40, malloc included).
>
> Just to make sure: Right now, we don't want to optimize further for 
> small TLVs other than removing the second pool for the values and 
> using the new struct with the VLA to reduce the overhead.
> For normal use, a pool with 160 B could  be used now to accommodate 
> for the new conn_tlv_list struct and 128 B TLVs (e.g., UNIQUE_ID)?
> For the authority type, it would then be a 255 + 32 B sized pool? 
> Maybe that could be used for the value range 128 <= x <= 255, and 
> then, for > 255, malloc?

Yes I think that will do the job (more like 128 < x < 256 BTW). I think you'd 
rather just focus on the type size (like you did) and not on the type itself, 
so that pools will automatically fit (i.e. have
128+sizeof(conn_tlv_list) and 256+sizeof() and the rest is for malloc().

> Other, smaller pools are future work?

Yes, as I'm not sure they're really that much used, and it will be easy to 
create smaller pools if we figure they're needed.

> >struct conn_tlv_list {
> >   struct list list;
> >unsigned short len; // 65535 should be more than enough!
> >unsigned char type;
> >char contents[0];
>  >   };
>
> I am also a bit confused about the second struct variant of this. IMO 
> this is the optimal struct layout in regards to size, that I would like to 
> use.
> What is the other struct for, where `len` and `type` are switched?

Ah, at first I didn't know what you were talking about, I remember having added 
the type while writing the message, it's just that I poorly pasted it the 
second time :-)  It's better to keep it as above, where types are better 
aligned and leave no hole. Hmmm BTW the struct will be padded, so you should 
use offsetof(conn_tlv_list, contents) rather than
sizeof(conn_tlv_list) for the size calculations. Or you can mark the struct 
with __attribute__((packed)), it will do the job as well. It's up to you.

> Thanks again for your efforts, it's really interesting for me to work 
> on HAProxy.

You're welcome, do not hesitate to send any question you may have.

Cheers,
Willy


0001-CLEANUP-MINOR-connection-Improve-co

Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-13 Thread Willy Tarreau
Hi Alexander,

On Fri, Aug 11, 2023 at 02:08:37PM +, Stephan, Alexander wrote:
> Hi Willy,
> 
> Thanks for the nice, detailed feedback.
> Overall, I agree with all of your listed points, so no need for further 
> discussions. ?
> I will hopefully send the separated patches at the beginning of next week.

OK! No rush anyway, such changes having a low impact can be merged late
in the cycle if needed, so take your time.

> Just a comment and two small questions to make sure I got things correct:
> 
> > As such I'd like them to be renamed to remove this confusion.
> > Maybe just call them HA_PP2_* ?
> 
> Yes, such a prefix will clean it up quite nicely IMO. Will add that to the
> first patch of the series.

Thanks.

> > [...]
> > It may even encourage us to create
> > smaller pools if ever deemed really useful (e.g. a 4- or 8- byte
> > load balancing hash key for end-to-end consistency would only
> > take a total of 32 or 40, malloc included).
> 
> Just to make sure: Right now, we don't want to optimize further for small
> TLVs other than removing the second pool for the values and using the new
> struct with the VLA to reduce the overhead.
> For normal use, a pool with 160 B could  be used now to accommodate for the
> new conn_tlv_list struct and 128 B TLVs (e.g., UNIQUE_ID)?
> For the authority type, it would then be a 255 + 32 B sized pool? Maybe that
> could be used for the value range 128 <= x <= 255, and then, for > 255,
> malloc?

Yes I think that will do the job (more like 128 < x < 256 BTW). I think
you'd rather just focus on the type size (like you did) and not on the
type itself, so that pools will automatically fit (i.e. have
128+sizeof(conn_tlv_list) and 256+sizeof() and the rest is for malloc().

> Other, smaller pools are future work?

Yes, as I'm not sure they're really that much used, and it will be easy
to create smaller pools if we figure they're needed.

> >struct conn_tlv_list {
> >   struct list list;
> >unsigned short len; // 65535 should be more than enough!
> >unsigned char type;
> >char contents[0];
>  >   };
> 
> I am also a bit confused about the second struct variant of this. IMO this is
> the optimal struct layout in regards to size, that I would like to use.
> What is the other struct for, where `len` and `type` are switched?

Ah, at first I didn't know what you were talking about, I remember having
added the type while writing the message, it's just that I poorly pasted
it the second time :-)  It's better to keep it as above, where types are
better aligned and leave no hole. Hmmm BTW the struct will be padded, so
you should use offsetof(conn_tlv_list, contents) rather than
sizeof(conn_tlv_list) for the size calculations. Or you can mark the struct
with __attribute__((packed)), it will do the job as well. It's up to you.

> Thanks again for your efforts, it's really interesting for me to work on
> HAProxy.

You're welcome, do not hesitate to send any question you may have.

Cheers,
Willy



RE: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-11 Thread Stephan, Alexander
Hi Willy,

Thanks for the nice, detailed feedback.
Overall, I agree with all of your listed points, so no need for further 
discussions. 😊
I will hopefully send the separated patches at the beginning of next week.

Just a comment and two small questions to make sure I got things correct:

> As such I'd like them to be renamed to remove this confusion.
> Maybe just call them HA_PP2_* ?

Yes, such a prefix will clean it up quite nicely IMO. Will add that to the 
first patch of the series.

> [...]
> It may even encourage us to create
> smaller pools if ever deemed really useful (e.g. a 4- or 8- byte
> load balancing hash key for end-to-end consistency would only
> take a total of 32 or 40, malloc included).

Just to make sure: Right now, we don't want to optimize further for small TLVs 
other than removing the second pool for the values and using the new struct 
with the VLA to reduce the overhead.
For normal use, a pool with 160 B could  be used now to accommodate for the new 
conn_tlv_list struct and 128 B TLVs (e.g., UNIQUE_ID)?
For the authority type, it would then be a 255 + 32 B sized pool? Maybe that 
could be used for the value range 128 <= x <= 255, and then, for > 255, malloc?
Other, smaller pools are future work?

>struct conn_tlv_list {
>   struct list list;
>unsigned short len; // 65535 should be more than enough!
>unsigned char type;
>char contents[0];
 >   };

I am also a bit confused about the second struct variant of this. IMO this is 
the optimal struct layout in regards to size, that I would like to use.
What is the other struct for, where `len` and `type` are switched?

Thanks again for your efforts, it's really interesting for me to work on 
HAProxy.

Best,
Alexander
-Original Message-
From: Willy Tarreau  
Sent: Thursday, August 10, 2023 9:18 AM
To: Stephan, Alexander 
Cc: haproxy@formilux.org
Subject: Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY 
protocol v2 TLV values

[You don't often get email from w...@1wt.eu. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Alexander,

On Mon, Jul 31, 2023 at 01:11:35PM +, Stephan, Alexander wrote:
> Dear HAProxy-maintainers,
>
> As proposed by my colleague Christian Menges in [1], I've implemented 
> support for fetching arbitrary TLV values for PROXY protocol V2 via a sample 
> fetch.

I'm afraid I don't remember seeing this message, and it was left without 
response, so it fell through the cracks, that's not cool for your colleague.

> It can be used by calling 'fc_pp_tlv' with the numerical value of the 
> desired TLV type. This also fixes issue [2].

Indeed.

> Some design considerations for my approach:
> o *Do not break the existing API*: Keep methods for authority and 
> unique ID, but refactor them internally with the updated, generified logic.

That's indeed preferable :-)

> o *Keep existing behavior and performance*: Pools for authority and 
> unique ID are still used. Already implemented parsing logic is still 
> applied. All information about a TLV payload that is already should be 
> used for validation.

OK.

> o *Small memory footprint*: As there might be up to 50k connections, 
> an array or hash map is too memory intensive. Therefore, a simple list 
> is used. It is the best choice here in my opinion, as there are 
> typically only a handful of TLVs.

I agree. At first I thought "Hmmm lists?" but we're speaking about very few 
items here, less than a handfull in general, so yes, I agree that the list 
might probably be the cheapest option.

> Additionally, memory pooling is used where possible. When TLV values 
> become too large there is a fallback to 'malloc'.

At first glance I like this approach. We can see in field how it deals with the 
traffic, but it should be OK. However I'm noticing that you put a limit to 127, 
which is a bit unfortunate, because we switched the sockaddr_storage to a pool 
a few years ago, and they're of size 128, so for one extra byte you could pick 
items from that shared pool.

> Note that I choose to not overwrite existing TLVs in the list. This 
> way, we return always the first found but would allow duplicate 
> entries. This would be relevant when there are duplicate TLVs.

Good point, I hadn't thought about this. And this is desirable because our ACLs 
can iterate over multiple instances of a sample. It seems that the current 
function smp_fetch_fc_pp_tlv() doesn't support it by the way, and will only 
return the first one. Please have a look at
smp_fetch_ssl_hello_alpn() or more generally some HTTP sample fetches that set 
SMP_F_NOT_LAST. This tells the ACL engine that if the last returned entry does 
not match, it can come back with the same context to re

Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-10 Thread Willy Tarreau
Hi Alexander,

On Mon, Jul 31, 2023 at 01:11:35PM +, Stephan, Alexander wrote:
> Dear HAProxy-maintainers,
> 
> As proposed by my colleague Christian Menges in [1], I've implemented support
> for fetching arbitrary TLV values for PROXY protocol V2 via a sample fetch.

I'm afraid I don't remember seeing this message, and it was left without
response, so it fell through the cracks, that's not cool for your colleague.

> It can be used by calling 'fc_pp_tlv' with the numerical value of the desired
> TLV type. This also fixes issue [2].

Indeed.

> Some design considerations for my approach:
> o *Do not break the existing API*: Keep methods for authority and unique ID,
> but refactor them internally with the updated, generified logic.

That's indeed preferable :-)

> o *Keep existing behavior and performance*: Pools for authority and unique ID
> are still used. Already implemented parsing logic is still applied. All
> information about a TLV payload that is already should be used for
> validation.

OK.

> o *Small memory footprint*: As there might be up to 50k connections, an array
> or hash map is too memory intensive. Therefore, a simple list is used. It is
> the best choice here in my opinion, as there are typically only a handful of
> TLVs.

I agree. At first I thought "Hmmm lists?" but we're speaking about very
few items here, less than a handfull in general, so yes, I agree that
the list might probably be the cheapest option.

> Additionally, memory pooling is used where possible. When TLV values
> become too large there is a fallback to 'malloc'.

At first glance I like this approach. We can see in field how it deals
with the traffic, but it should be OK. However I'm noticing that you put
a limit to 127, which is a bit unfortunate, because we switched the
sockaddr_storage to a pool a few years ago, and they're of size 128, so
for one extra byte you could pick items from that shared pool.

> Note that I choose to not overwrite existing TLVs in the list. This way, we
> return always the first found but would allow duplicate entries. This would
> be relevant when there are duplicate TLVs.

Good point, I hadn't thought about this. And this is desirable because
our ACLs can iterate over multiple instances of a sample. It seems that
the current function smp_fetch_fc_pp_tlv() doesn't support it by the way,
and will only return the first one. Please have a look at
smp_fetch_ssl_hello_alpn() or more generally some HTTP sample fetches that
set SMP_F_NOT_LAST. This tells the ACL engine that if the last returned
entry does not match, it can come back with the same context to retrieve
the next one, and so on. Once you reach the end, you just drop that flag
and the ACL engine has its final status. In your context you would store
a retry point (maybe even the list's pointer). There's even a
list_for_each_entry_from() to restart from a known point.

> Besides, I used 'strtoul' for the argument conversion to be consistent with
> other fetches that already use 'strtoul'.
> If 'strtoul' is too slow for this use case, I am not opposed to reverting it
> to the more efficient HAProxy helper.

That's not the right way to do it. When you declare a sample fetch
function, you can also declare a function that will check its arguments.
And that function can replace these args. It's done quite a bit to turn a
string to int after verifying a range for example. Please have a look at
sample_conv_mqtt_field_value_check() to see how that's used. This could
even allow you to implement symbolic names as aliases for know integer
values.

> *Important*: I will add the documentation after the code review, when the
> design is finalized.

Yeah I understand :-)

Now the patch below:

> diff --git a/include/haproxy/connection-t.h b/include/haproxy/connection-t.h
> index eb02bbcf2..a7ebed5ec 100644
> --- a/include/haproxy/connection-t.h
> +++ b/include/haproxy/connection-t.h
> @@ -501,6 +501,17 @@ struct conn_hash_params {
>   struct sockaddr_storage *dst_addr;
>  };
>  
> +/*
> + * This structure describes an TLV entry consisting of its type
> + * and corresponding payload. This can be used to construct a list
> + * from which arbitrary TLV payloads can be fetched.
> + */
> +struct conn_tlv_list {
> +struct list list;
> +struct ist value;
> +unsigned char type;
> +};
> +
>  /* This structure describes a connection with its methods and data.
>   * A connection may be performed to proxy or server via a local or remote
>   * socket, and can also be made to an internal applet. It can support
> @@ -534,10 +545,9 @@ struct connection {
>  
>   /* third cache line and beyond */
>   void (*destroy_cb)(struct connection *conn);  /* callback to notify of 
> imminent death of the connection */
> - struct sockaddr_storage *src; /* source address (pool), when known, 
> otherwise NULL */
> - struct sockaddr_storage *dst; /* destination address (pool), when 
> known, otherwise NULL */
> - struct ist proxy_authority;   /

Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY protocol v2 TLV values

2023-08-05 Thread Willy Tarreau
Hi Alexander,

On Mon, Jul 31, 2023 at 01:11:35PM +, Stephan, Alexander wrote:
> Dear HAProxy-maintainers,
> 
> As proposed by my colleague Christian Menges in [1], I've implemented support
> for fetching arbitrary TLV values for PROXY protocol V2 via a sample fetch.
> It can be used by calling 'fc_pp_tlv' with the numerical value of the desired
> TLV type. This also fixes issue [2].
(...)

Still not had the time to review your work in details, though at first
glance I suspect it looks fine. I hope to be able to review it next week
but at least wanted to ack receipt.

Thanks,
Willy