Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Theo de Raadt
Mike Larkin  wrote:

> On Wed, Sep 29, 2021 at 08:44:54PM -0400, David Anthony wrote:
> > After enabling "BIOS Thunderbolt Assist", I experience consistent machine
> > slowdown on my T480. Previously, I experienced slowdown after power cycling
> > my machine occasionally. Currently, with this BIOS setting enabled, I
> > experience slowdown consistently.
> >
> > I am sorry but I don't know enough technically as to discern why. I am
> > simply reporting my user experience. I will re-disable the Thunderbolt
> > assist for now.
> >
> 
> If someone would build an ACPI_DEBUG kernel and show us what GPE is stuck
> then we can make forward progress (we need an acpidump of that machine
> also).
> 
> Otherwise, its like throwing darts in the dark.

Or, someone with a machine which has the problem can give it to Mike,
or a few other developers who understand this problem area.

I'm not joking.  Give it.  It would be a public service.



Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Daniel Wilkins
On Wed, Sep 29, 2021 at 06:29:08PM -0700, Mike Larkin wrote:
> On Wed, Sep 29, 2021 at 08:44:54PM -0400, David Anthony wrote:
> > After enabling "BIOS Thunderbolt Assist", I experience consistent machine
> > slowdown on my T480. Previously, I experienced slowdown after power cycling
> > my machine occasionally. Currently, with this BIOS setting enabled, I
> > experience slowdown consistently.
> >
> > I am sorry but I don't know enough technically as to discern why. I am
> > simply reporting my user experience. I will re-disable the Thunderbolt
> > assist for now.
> >
>
> If someone would build an ACPI_DEBUG kernel and show us what GPE is stuck
> then we can make forward progress (we need an acpidump of that machine
> also).
>
> Otherwise, its like throwing darts in the dark.
>
> -ml

I could give it a shot. Do you want all three possible states for the
dumps? (disabled, working. Disabled, looped acpi0. Enabled, working.)

It probably won't be until tomorrow since it's already pretty late,
though.

Danny



Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Mike Larkin
On Wed, Sep 29, 2021 at 08:44:54PM -0400, David Anthony wrote:
> After enabling "BIOS Thunderbolt Assist", I experience consistent machine
> slowdown on my T480. Previously, I experienced slowdown after power cycling
> my machine occasionally. Currently, with this BIOS setting enabled, I
> experience slowdown consistently.
>
> I am sorry but I don't know enough technically as to discern why. I am
> simply reporting my user experience. I will re-disable the Thunderbolt
> assist for now.
>

If someone would build an ACPI_DEBUG kernel and show us what GPE is stuck
then we can make forward progress (we need an acpidump of that machine
also).

Otherwise, its like throwing darts in the dark.

-ml

> On 9/29/21 2:58 PM, David Anthony wrote:
> > Another T480 user who has noticed the same problem. Per advice given,
> > I've just enabled "BIOS Thunderbolt Assist". I will report back if I
> > notice the problem persists.
> >
> > On 9/19/21 4:50 AM, Daniel Wilkins wrote:
> > > I've ran into this on my T480, it seems most consistently triggered
> > > by power
> > > cycles caused by running out of battery. The bug's existed for quite
> > > a few
> > > years (I think I first noticed it in 2019.) If I recall correctly I've
> > > posted it to the list a couple of times but I don't think any
> > > concrete answers
> > > ever emerged; your report is more thorough than mine were though.
> > > I do remember that it never happened on my T430, but that's quite the
> > > hardware gap.
> > >
> >
>



Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread David Anthony
After enabling "BIOS Thunderbolt Assist", I experience consistent 
machine slowdown on my T480. Previously, I experienced slowdown after 
power cycling my machine occasionally. Currently, with this BIOS setting 
enabled, I experience slowdown consistently.


I am sorry but I don't know enough technically as to discern why. I am 
simply reporting my user experience. I will re-disable the Thunderbolt 
assist for now.


On 9/29/21 2:58 PM, David Anthony wrote:
Another T480 user who has noticed the same problem. Per advice given, 
I've just enabled "BIOS Thunderbolt Assist". I will report back if I 
notice the problem persists.


On 9/19/21 4:50 AM, Daniel Wilkins wrote:
I've ran into this on my T480, it seems most consistently triggered 
by power
cycles caused by running out of battery. The bug's existed for quite 
a few

years (I think I first noticed it in 2019.) If I recall correctly I've
posted it to the list a couple of times but I don't think any 
concrete answers

ever emerged; your report is more thorough than mine were though.
I do remember that it never happened on my T430, but that's quite the
hardware gap.







Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread David Anthony
Another T480 user who has noticed the same problem. Per advice given, 
I've just enabled "BIOS Thunderbolt Assist". I will report back if I 
notice the problem persists.


On 9/19/21 4:50 AM, Daniel Wilkins wrote:

I've ran into this on my T480, it seems most consistently triggered by power
cycles caused by running out of battery. The bug's existed for quite a few
years (I think I first noticed it in 2019.) If I recall correctly I've
posted it to the list a couple of times but I don't think any concrete answers
ever emerged; your report is more thorough than mine were though.
I do remember that it never happened on my T430, but that's quite the
hardware gap.





Re: SOLVED Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Daniel Wilkins
On Wed, Sep 29, 2021 at 11:47:34AM -0600, Theo de Raadt wrote:
> It would be great if someone figures out why "BIOS Thunderbolt Assist"
> disable, causes a pin to get stuck on resume, and/or figures out how we
> can recognize to handle/clear the event.

The detail in my BIOS options specifically mentions it as a Linux
workaround. Obviously patches couldn't be imported but I'll poke
around to see if there's any discussion/a description of what
exactly is happening.

Aside from that is there any data I can send y'all? Jonathan's built up
a pretty comprehensive set of dmesgs at this point, it seems like.

(No need to cc me, I'm on misc@)

Danny



SOLVED Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Jonathan Thornburg
Hi,

On 2021-09-28 14>18>49, Daniel Wilkins wrote
> All you have to do is go into your bios' settings and turn on
> "BIOS Thunderbolt Assist" then everything will work 100% fine.
> 
> Thanks to jcs on IRC for pointing me at that (dunno what his
> email is.)

Success!  With this (and the 7.0 snapshot I installed yesterday; dmesg
in my message )
the problem is gone, and my T580 now does suspend/resume perfectly
(including idling with CPU usage under 1%).

A big thank-you to Daniel and to jcs (I'm guessing that's Joshua Stein,
https://jcs.org/) for the solution, and to Theo and Mike for their
suggestions too!

Thanks again,

--
-- "Jonathan Thornburg [remove color- to reply]" 
   on the west coast of Canada, eh?
   "There was of course no way of knowing whether you were being watched
at any given moment.  How often, or on what system, the Thought Police
plugged in on any individual wire was guesswork.  It was even conceivable
that they watched everybody all the time."  -- George Orwell, "1984"



Re: SOLVED Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Theo de Raadt
Jonathan Thornburg  wrote:

> On 2021-09-28 14>18>49, Daniel Wilkins wrote
> > All you have to do is go into your bios' settings and turn on
> > "BIOS Thunderbolt Assist" then everything will work 100% fine.
> > 
> > Thanks to jcs on IRC for pointing me at that (dunno what his
> > email is.)
> 
> Success!  With this (and the 7.0 snapshot I installed yesterday; dmesg
> in my message )
> the problem is gone, and my T580 now does suspend/resume perfectly
> (including idling with CPU usage under 1%).
> 
> A big thank-you to Daniel and to jcs (I'm guessing that's Joshua Stein,
> https://jcs.org/) for the solution, and to Theo and Mike for their
> suggestions too!

It would be great if someone figures out why "BIOS Thunderbolt Assist"
disable, causes a pin to get stuck on resume, and/or figures out how we
can recognize to handle/clear the event.




Re: Mellanox driver support details https://man.openbsd.org/mcx.4

2021-09-29 Thread Stuart Henderson
On 2021-09-29, Andrew Lemin  wrote:
> And to answer my last question about SMP capabilities, it looks like the
> only locking going on is when the driver is talking to the Kernel itself
> through kstat which would make sense. So yes it looks like mcx does have
> SMP support :)

$ cd /sys/dev/pci; grep pci_intr_establish_cpu *
if_bnxt.c:  bq->q_ihc = pci_intr_establish_cpu(sc->sc_pc, 
ih,
if_ix.c:que->tag = pci_intr_establish_cpu(pa->pa_pc, ih,
if_ixl.c:   iv->iv_ihc = pci_intr_establish_cpu(sc->sc_pc, 
ih,
if_mcx.c:   q->q_ihc = pci_intr_establish_cpu(sc->sc_pc, ih,
if_vmx.c:   q->ih = pci_intr_establish_cpu(pa->pa_pc, ih,

>Well its enough for me to buy a card from ebay to play with
> as the ConnectX-4 Lx cards are pretty cheap now.

new (from fs) aren't much more expensive either btw.

>> I was able to decipher some of it using this
>> https://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf
>> (this is very well written).

nvidia bought the company so that might be the high point in the documentation
for these..




Re: 6.9/amd64 runaway acpi process on Thinkpad T580

2021-09-29 Thread Daniel Wilkins
On Tue, Sep 28, 2021 at 10:08:47PM -0600, Theo de Raadt wrote:
> There are a few people who have experience with this.  Maybe one of
> them will mail you privately.
>

I'm glad this thread suddenly got revived, since I tried to find it
in my backlog but it got lost.

All you have to do is go into your bios' settings and turn on
"BIOS Thunderbolt Assist" then everything will work 100% fine.

Thanks to jcs on IRC for pointing me at that (dunno what his
email is.)



Re: Mellanox driver support details https://man.openbsd.org/mcx.4

2021-09-29 Thread Andrew Lemin
And to answer my last question about SMP capabilities, it looks like the
only locking going on is when the driver is talking to the Kernel itself
through kstat which would make sense. So yes it looks like mcx does have
SMP support :) Well its enough for me to buy a card from ebay to play with
as the ConnectX-4 Lx cards are pretty cheap now.

Warning to others reading my comments, me poking around in kernel code is
akin to a blind person in a library before learning braille, so take
nothing I say as fact, merely optimistic opinion :)

On Wed, Sep 29, 2021 at 9:08 PM Andrew Lemin  wrote:

> So I think I have figured out some things Theo browsing through
> https://github.com/openbsd/src/blob/master/sys/dev/pci/if_mcx.c.
>
> I can see that some offloading is supported, but have not yet figured out
> how much is implemented yet. It looks like the offloading capability in
> these cards are much more granular than I have understood from previous
> hardware.
> I was able to decipher some of it using this
> https://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf
> (this is very well written).
>
> And I was quite excited to see what looks like the RDMA access support in
> the mcx driver! So we should be able to see the super low latency
> capabilities with this card :)
>
> I will keep pushing myself.. Thanks again Theo
>
> On Wed, Sep 29, 2021 at 2:21 PM Andrew Lemin 
> wrote:
>
>> Hi Theo :)
>>
>> Ok sure, I will put on my cape-of-courage and start reading the source..
>> I may be some time!
>>
>> On Wed, Sep 29, 2021 at 1:56 PM Theo de Raadt 
>> wrote:
>>
>>> We tend to keep our driver manual pages without detailed promises.
>>> They do ethernet, they do it best effort, etc.
>>>
>>> What you want to know can be found by reading the source, or the
>>> commit logs.  Since this is a locally written driver, the code is
>>> surprisingly approachable.
>>>
>>> Andrew Lemin  wrote:
>>>
>>> > Hi. I hope everyone is well and having a great day :)
>>> >
>>> > Just a quick question about the mcx (Mellanox 5th generation Ethernet
>>> > device) drivers
>>> > https://man.openbsd.org/mcx.4
>>> >
>>> > The man page says nothing more than it supports;
>>> > ConnectX-4 Lx EN
>>> > ConnectX-4 EN
>>> > ConnectX-5 EN
>>> > ConnectX-6 EN
>>> >
>>> > I am looking for some clarity on what features and performance
>>> > characteristics mcx boasts?
>>> >
>>> > For example are the following basic hardware features supported by this
>>> > driver?
>>> > IPv4 receive IP/TCP/UDP checksum offload
>>> > IPv4 transmit TCP/UDP checksum offload
>>> > VLAN tag insertion and stripping
>>> > interrupt coalescing
>>> >
>>> > And what other features does it support?
>>> >
>>> > I also came across a comment in some forum a while back (so high
>>> quality
>>> > information ) that mentioned Mellanox drivers in OpenBSD are SMP
>>> safe and
>>> > so not giant-locked. Is this true?
>>> >
>>> > Thanks, Andy,
>>>
>>


Re: Mellanox driver support details https://man.openbsd.org/mcx.4

2021-09-29 Thread Andrew Lemin
So I think I have figured out some things Theo browsing through
https://github.com/openbsd/src/blob/master/sys/dev/pci/if_mcx.c.

I can see that some offloading is supported, but have not yet figured out
how much is implemented yet. It looks like the offloading capability in
these cards are much more granular than I have understood from previous
hardware.
I was able to decipher some of it using this
https://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf
(this is very well written).

And I was quite excited to see what looks like the RDMA access support in
the mcx driver! So we should be able to see the super low latency
capabilities with this card :)

I will keep pushing myself.. Thanks again Theo

On Wed, Sep 29, 2021 at 2:21 PM Andrew Lemin  wrote:

> Hi Theo :)
>
> Ok sure, I will put on my cape-of-courage and start reading the source.. I
> may be some time!
>
> On Wed, Sep 29, 2021 at 1:56 PM Theo de Raadt  wrote:
>
>> We tend to keep our driver manual pages without detailed promises.
>> They do ethernet, they do it best effort, etc.
>>
>> What you want to know can be found by reading the source, or the
>> commit logs.  Since this is a locally written driver, the code is
>> surprisingly approachable.
>>
>> Andrew Lemin  wrote:
>>
>> > Hi. I hope everyone is well and having a great day :)
>> >
>> > Just a quick question about the mcx (Mellanox 5th generation Ethernet
>> > device) drivers
>> > https://man.openbsd.org/mcx.4
>> >
>> > The man page says nothing more than it supports;
>> > ConnectX-4 Lx EN
>> > ConnectX-4 EN
>> > ConnectX-5 EN
>> > ConnectX-6 EN
>> >
>> > I am looking for some clarity on what features and performance
>> > characteristics mcx boasts?
>> >
>> > For example are the following basic hardware features supported by this
>> > driver?
>> > IPv4 receive IP/TCP/UDP checksum offload
>> > IPv4 transmit TCP/UDP checksum offload
>> > VLAN tag insertion and stripping
>> > interrupt coalescing
>> >
>> > And what other features does it support?
>> >
>> > I also came across a comment in some forum a while back (so high quality
>> > information ) that mentioned Mellanox drivers in OpenBSD are SMP safe
>> and
>> > so not giant-locked. Is this true?
>> >
>> > Thanks, Andy,
>>
>


Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Andrew Lemin
Ah,

Your diagram makes perfect sense now :) Thank you - So it does not have to
undergo a full rehashing of all links (which breaks _lots_ of sessions when
NAT is involved), but also does not have to explicitly track anything in
memory like you say  So better than full re-hashing and cheaper than
tracking.

PS; Thank you for confirming; "It therefor routes the same src/dst pair
over the same nexthop as long as there are no changes to the route".
I was getting hung up on the bit in the RFC that says "hash over the packet
header fields that identify a flow", so I was imagining the hashing was
using a lot of entropy including the ports. I guess I should have thought
around that more and read it as "hash over the IP packet header fields that
identify a flow" ;)

I shall go and experiment :)


On Wed, Sep 29, 2021 at 8:45 PM Claudio Jeker 
wrote:

> On Wed, Sep 29, 2021 at 08:07:43PM +1000, Andrew Lemin wrote:
> > Hi Claudio,
> >
> > So you probably guessed I am using 'route-to { GW1, GW2, GW3, GW4 }
> random'
> > (and was wanting to add 'sticky-address' to this) based on your reply :)
> >
> > "it will make sure that selected default routes are sticky to source/dest
> > pairs" - Are you saying that even though multipath routing uses hashing
> to
> > select the path (https://www.ietf.org/rfc/rfc2992.txt - "The router
> first
> > selects a key by performing a hash (e.g., CRC16) over the packet header
> > fields that identify a flow."), subsequent new sessions to the same dest
> IP
> > with different source ports will still get the same path? I thought a new
> > session with a new tuple to the same dest IP would get a different hashed
> > path with multipath?
>
> OpenBSD multipath routing implements gateway selection by Hash-Threshold
> from RFC 2992. It therefor routes the same src/dst pair over the same
> nexthop as long as there are no changes to the route. If one of your
> links drops then some sessions will move links but the goal of
> hash-threshold is to minimize the affected session.
>
> > "On rerouting the multipath code reshuffles the selected routes in a way
> to
> > minimize the affected sessions." - Are you saying, in the case where one
> > path goes down, it will migrate all the entries only for that failed path
> > onto the remaining good paths (like ecmp-fast-reroute ?)
>
> No, some session on good paths may also migrate to other links, this is
> how the hash-threshold algorithm works.
>
> Split with 4 nexthops, now lets assume link 2 dies and stuff gets
> reshuffled:
> +=+=+=+=+
> |   link   1  |   link   2  |   link   3  |   link   4  |
> +=+=+===+===+=+=+
> |   link   1|   link   3|   link   4|
> +===+
> Unaffected sessions for drop
>  ^   ^^^   ^
> Affected sessions because of drop
># #
> Unsing other ways to split the hash into buckets (e.g. a simple modulo)
> causes more change.
>
> Btw. using route-to with 4 gw will not detect a link failure and 25% of
> your traffic will be dropped. This is another advantage of multipath
> routing.
>
> Cheers
> --
> :wq Claudio
>
> > Thanks for your time, Andy.
> >
> > On Wed, Sep 29, 2021 at 5:21 PM Claudio Jeker 
> > wrote:
> >
> > > On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> > > > I see this question died on its arse! :)
> > > >
> > > > This is still an issue for outbound load-balancing over multiple
> internet
> > > > links.
> > > >
> > > > PF's 'sticky-address' parameter only works on source IPs (because it
> was
> > > > originally designed for use when hosting your own server pools -
> inbound
> > > > load balancing).
> > > > I.e. There is no way to configure 'sticky-address' to consider
> > > destination
> > > > IPs for outbound load balancing, so all subsequent outbound
> connections
> > > to
> > > > the same target IP originate from the same internet connection.
> > > >
> > > > The reason why this is desirable is because an increasing number of
> > > > websites use single sign on mechanisms (quite a few different
> > > architectures
> > > > expose the issue described here). After a users outbound connection
> is
> > > > initially randomly load balanced onto an internet connection, their
> > > browser
> > > > is redirected into opening multiple additional sockets towards the
> > > > website's load balancers / cloud gateways, which redirect the
> connections
> > > > to different internal servers for different parts of the site/page,
> and
> > > the
> > > > SSO authentication/cookies passed on the additional sockets must to
> > > > originate from the same IP as the original socket. As a result
> outbound
> > > > load-balancing does not work for these sites.
> > > >
> > > > The 

Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Claudio Jeker
On Wed, Sep 29, 2021 at 08:07:43PM +1000, Andrew Lemin wrote:
> Hi Claudio,
> 
> So you probably guessed I am using 'route-to { GW1, GW2, GW3, GW4 } random'
> (and was wanting to add 'sticky-address' to this) based on your reply :)
> 
> "it will make sure that selected default routes are sticky to source/dest
> pairs" - Are you saying that even though multipath routing uses hashing to
> select the path (https://www.ietf.org/rfc/rfc2992.txt - "The router first
> selects a key by performing a hash (e.g., CRC16) over the packet header
> fields that identify a flow."), subsequent new sessions to the same dest IP
> with different source ports will still get the same path? I thought a new
> session with a new tuple to the same dest IP would get a different hashed
> path with multipath?

OpenBSD multipath routing implements gateway selection by Hash-Threshold
from RFC 2992. It therefor routes the same src/dst pair over the same
nexthop as long as there are no changes to the route. If one of your
links drops then some sessions will move links but the goal of
hash-threshold is to minimize the affected session.

> "On rerouting the multipath code reshuffles the selected routes in a way to
> minimize the affected sessions." - Are you saying, in the case where one
> path goes down, it will migrate all the entries only for that failed path
> onto the remaining good paths (like ecmp-fast-reroute ?)

No, some session on good paths may also migrate to other links, this is
how the hash-threshold algorithm works.

Split with 4 nexthops, now lets assume link 2 dies and stuff gets
reshuffled:
+=+=+=+=+
|   link   1  |   link   2  |   link   3  |   link   4  |
+=+=+===+===+=+=+
|   link   1|   link   3|   link   4|
+===+
Unaffected sessions for drop
 ^   ^^^   ^
Affected sessions because of drop
   # #
Unsing other ways to split the hash into buckets (e.g. a simple modulo)
causes more change.

Btw. using route-to with 4 gw will not detect a link failure and 25% of
your traffic will be dropped. This is another advantage of multipath
routing.

Cheers
-- 
:wq Claudio

> Thanks for your time, Andy.
> 
> On Wed, Sep 29, 2021 at 5:21 PM Claudio Jeker 
> wrote:
> 
> > On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> > > I see this question died on its arse! :)
> > >
> > > This is still an issue for outbound load-balancing over multiple internet
> > > links.
> > >
> > > PF's 'sticky-address' parameter only works on source IPs (because it was
> > > originally designed for use when hosting your own server pools - inbound
> > > load balancing).
> > > I.e. There is no way to configure 'sticky-address' to consider
> > destination
> > > IPs for outbound load balancing, so all subsequent outbound connections
> > to
> > > the same target IP originate from the same internet connection.
> > >
> > > The reason why this is desirable is because an increasing number of
> > > websites use single sign on mechanisms (quite a few different
> > architectures
> > > expose the issue described here). After a users outbound connection is
> > > initially randomly load balanced onto an internet connection, their
> > browser
> > > is redirected into opening multiple additional sockets towards the
> > > website's load balancers / cloud gateways, which redirect the connections
> > > to different internal servers for different parts of the site/page, and
> > the
> > > SSO authentication/cookies passed on the additional sockets must to
> > > originate from the same IP as the original socket. As a result outbound
> > > load-balancing does not work for these sites.
> > >
> > > The ideal functionality would be for 'sticky-address' to consider both
> > > source IP and destination IP after initially being load balanced by
> > > round-robin or random.
> >
> > Just use multipath routing, it will make sure that selected default routes
> > are sticky to source/dest pairs. You may want the states to be interface
> > bound if you need to nat-to on those links.
> >
> > On rerouting the multipath code reshuffles the selected routes in a way to
> > minimize the affected sessions. All this is done without any extra memory
> > usage since the hashing function is smart.
> >
> > --
> > :wq Claudio
> >
> >
> > > Thanks again, Andy.
> > >
> > > On Sat, Apr 3, 2021 at 12:40 PM Andy Lemin 
> > wrote:
> > >
> > > > Hi smart people :)
> > > >
> > > > The current implementation of ‘sticky-address‘ relates only to a sticky
> > > > source IP.
> > > > https://www.openbsd.org/faq/pf/pools.html
> > > >
> > > > This is used for inbound server load balancing, by ensuring that all
> > > > socket connections from the same client/user/IP on 

Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Andrew Lemin
Hi Claudio,

So you probably guessed I am using 'route-to { GW1, GW2, GW3, GW4 } random'
(and was wanting to add 'sticky-address' to this) based on your reply :)

"it will make sure that selected default routes are sticky to source/dest
pairs" - Are you saying that even though multipath routing uses hashing to
select the path (https://www.ietf.org/rfc/rfc2992.txt - "The router first
selects a key by performing a hash (e.g., CRC16) over the packet header
fields that identify a flow."), subsequent new sessions to the same dest IP
with different source ports will still get the same path? I thought a new
session with a new tuple to the same dest IP would get a different hashed
path with multipath?

"On rerouting the multipath code reshuffles the selected routes in a way to
minimize the affected sessions." - Are you saying, in the case where one
path goes down, it will migrate all the entries only for that failed path
onto the remaining good paths (like ecmp-fast-reroute ?)

Thanks for your time, Andy.

On Wed, Sep 29, 2021 at 5:21 PM Claudio Jeker 
wrote:

> On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> > I see this question died on its arse! :)
> >
> > This is still an issue for outbound load-balancing over multiple internet
> > links.
> >
> > PF's 'sticky-address' parameter only works on source IPs (because it was
> > originally designed for use when hosting your own server pools - inbound
> > load balancing).
> > I.e. There is no way to configure 'sticky-address' to consider
> destination
> > IPs for outbound load balancing, so all subsequent outbound connections
> to
> > the same target IP originate from the same internet connection.
> >
> > The reason why this is desirable is because an increasing number of
> > websites use single sign on mechanisms (quite a few different
> architectures
> > expose the issue described here). After a users outbound connection is
> > initially randomly load balanced onto an internet connection, their
> browser
> > is redirected into opening multiple additional sockets towards the
> > website's load balancers / cloud gateways, which redirect the connections
> > to different internal servers for different parts of the site/page, and
> the
> > SSO authentication/cookies passed on the additional sockets must to
> > originate from the same IP as the original socket. As a result outbound
> > load-balancing does not work for these sites.
> >
> > The ideal functionality would be for 'sticky-address' to consider both
> > source IP and destination IP after initially being load balanced by
> > round-robin or random.
>
> Just use multipath routing, it will make sure that selected default routes
> are sticky to source/dest pairs. You may want the states to be interface
> bound if you need to nat-to on those links.
>
> On rerouting the multipath code reshuffles the selected routes in a way to
> minimize the affected sessions. All this is done without any extra memory
> usage since the hashing function is smart.
>
> --
> :wq Claudio
>
>
> > Thanks again, Andy.
> >
> > On Sat, Apr 3, 2021 at 12:40 PM Andy Lemin 
> wrote:
> >
> > > Hi smart people :)
> > >
> > > The current implementation of ‘sticky-address‘ relates only to a sticky
> > > source IP.
> > > https://www.openbsd.org/faq/pf/pools.html
> > >
> > > This is used for inbound server load balancing, by ensuring that all
> > > socket connections from the same client/user/IP on the internet goes
> to the
> > > same server on your local server pool.
> > >
> > > This works great for ensuring simplified memory management of session
> > > artefacts on the application being hosted (the servers do not have to
> > > synchronise the users session data as extra sockets from that user will
> > > always connect to the same local server)
> > >
> > > However sticky-address does not have an equivalent for sticky
> destination
> > > IPs. For example when doing outbound load balancing over multiple ISP
> > > links, every single socket is load balanced randomly. This causes many
> > > websites to break (especially cookie login and single-sign-on style
> > > enterprise services), as the first outbound socket will originate
> randomly
> > > from one of the local ISP IPs, and the users login session/SSO (on the
> > > server side) will belong to that first random IP.
> > >
> > > When the user then browses to or uses another part of that same website
> > > which requires additional sockets, the additional sockets will pass
> the SSO
> > > credentials from the first socket, but the extra socket connection will
> > > again be randomly load-balanced, and so the remote server will reject
> the
> > > connection as it is originating from the wrong source IP etc.
> > >
> > > Therefore can I please propose a “sticky-address for destination IPs”
> as
> > > an analogue to the existing sticky-address for source IPs?
> > >
> > > This is now such a problem that we have to use sticky-address even on
> > > outbound load-balancing connections, which 

Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Claudio Jeker
On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> I see this question died on its arse! :)
> 
> This is still an issue for outbound load-balancing over multiple internet
> links.
> 
> PF's 'sticky-address' parameter only works on source IPs (because it was
> originally designed for use when hosting your own server pools - inbound
> load balancing).
> I.e. There is no way to configure 'sticky-address' to consider destination
> IPs for outbound load balancing, so all subsequent outbound connections to
> the same target IP originate from the same internet connection.
> 
> The reason why this is desirable is because an increasing number of
> websites use single sign on mechanisms (quite a few different architectures
> expose the issue described here). After a users outbound connection is
> initially randomly load balanced onto an internet connection, their browser
> is redirected into opening multiple additional sockets towards the
> website's load balancers / cloud gateways, which redirect the connections
> to different internal servers for different parts of the site/page, and the
> SSO authentication/cookies passed on the additional sockets must to
> originate from the same IP as the original socket. As a result outbound
> load-balancing does not work for these sites.
> 
> The ideal functionality would be for 'sticky-address' to consider both
> source IP and destination IP after initially being load balanced by
> round-robin or random.

Just use multipath routing, it will make sure that selected default routes
are sticky to source/dest pairs. You may want the states to be interface
bound if you need to nat-to on those links.

On rerouting the multipath code reshuffles the selected routes in a way to
minimize the affected sessions. All this is done without any extra memory
usage since the hashing function is smart.

-- 
:wq Claudio

 
> Thanks again, Andy.
> 
> On Sat, Apr 3, 2021 at 12:40 PM Andy Lemin  wrote:
> 
> > Hi smart people :)
> >
> > The current implementation of ‘sticky-address‘ relates only to a sticky
> > source IP.
> > https://www.openbsd.org/faq/pf/pools.html
> >
> > This is used for inbound server load balancing, by ensuring that all
> > socket connections from the same client/user/IP on the internet goes to the
> > same server on your local server pool.
> >
> > This works great for ensuring simplified memory management of session
> > artefacts on the application being hosted (the servers do not have to
> > synchronise the users session data as extra sockets from that user will
> > always connect to the same local server)
> >
> > However sticky-address does not have an equivalent for sticky destination
> > IPs. For example when doing outbound load balancing over multiple ISP
> > links, every single socket is load balanced randomly. This causes many
> > websites to break (especially cookie login and single-sign-on style
> > enterprise services), as the first outbound socket will originate randomly
> > from one of the local ISP IPs, and the users login session/SSO (on the
> > server side) will belong to that first random IP.
> >
> > When the user then browses to or uses another part of that same website
> > which requires additional sockets, the additional sockets will pass the SSO
> > credentials from the first socket, but the extra socket connection will
> > again be randomly load-balanced, and so the remote server will reject the
> > connection as it is originating from the wrong source IP etc.
> >
> > Therefore can I please propose a “sticky-address for destination IPs” as
> > an analogue to the existing sticky-address for source IPs?
> >
> > This is now such a problem that we have to use sticky-address even on
> > outbound load-balancing connections, which causes internal user1 to always
> > use the same ISP for _everthing_ etc. While this does stop the breakage, it
> > does not result in evenly distributed balancing of traffic, as users are
> > locked to one single transit, for all their web browsing for the rest of
> > the day after being randomly balanced once first-thing in the morning,
> > rather than all users balancing over all transits throughout the day.
> >
> > Another pain; using the current source-ip sticky-address for outbound
> > balancing, makes it hard to drain transits for maintenance. For example
> > without source sticky-address balancing, you can just remove the transit
> > from the Pf rule, and after some time, all traffic will eventually move
> > over to the other transits, allowing the first to be shut down for whatever
> > needs. But with the current source-ip sticky-address, that first transit
> > will take months to drain in a real-world situations..
> >
> > lastly just as a nice-to-have, how feasible would a deterministic load
> > balancing algorithm be? So that balancing selection is done based on the
> > “least utilised” path?
> >
> > Thanks for your time and consideration,
> > Kindest regards Andy
> >
> >
> >
> > Sent from a teeny tiny