On 1.3.2023 12.34, Schnurrenberger Tobias (ID) via radiator wrote:

We are facing a problem with tacacs performance. In our network there is an 
external service that connects to all network devices every 5 minutes, which 
causes a huge load of tacacs requests. It peaks at approx. 4000 new TCP sessions 
per second (measured by conntrack -E -e NEW | pv -l -i 1 -r > /dev/null3) on 
the tacacs server. This external service should not be changed.

Thanks for the configuration and details about your operating environment. I think there are a couple of options how to update the frontend. The backend seems to be already well set up for handling large amounts of requests.

In the radiator tacacs log there are a lot of these error messages:
ERR: ServerTACACSPLUS Stream sysread for 10.1.1.16 (10.1.1.16 port 60088) 
failed: . Peer probably disconnected.
(Although this client connected with IPv4, most of them are configured to use 
IPv6 only.)

The default log level for this was updated for Radiator 4.27. With 4.27 this is now a DEBUG level message. You can reconfigure it like this within <ServerTACACSPLUS>

    DisconnectTraceLevel 4

The disconnects by a TACACS+ client are normal and expected. Some releases ago, the TCP stream handling was unified and while there are some protocols where disconnect by client is unexpected, this is not the case. In short: this not an error with TACACS+.

And on the client side (network devices) we see messages, that they didn't get 
an answer from the tacacs servers, e.g.:
%TACACS-3-TACACS_ERROR_MESSAGE: All servers failed to respond

This is likely caused by the frontend not being able to keep up with all the requests. As mentioned above, not related to disconnect log messages on Radiator side.

We tried to configure the tacacs service as a farm with 16 children, hoping the load 
would be balanced. However, all we see is that the "frontend" process is using 
~100% of one CPU and the farm children are staying relatively calm (0-20% CPU usage).

You could keep this for now since it's already working. Once the frontend is able to handle all requests, it should raise the backend load too.

With all these things tried it seems that the ServerTACACSPLUS is just not fast 
enough. Is there any other option to increase the performance of our tacacs 
service?

I agree with this diagnosis. The first thing you could consider is utilising 'AllowAuthorizeOnly 1' you have already configured.

This options allows Radiator to lookup authorisation information for TACACS+ when no such information is already present. By default, GroupMemberAttr for authorisation information must be fetched during the authentication.

When AllowAuthorizeOnly is set, Radiator triggers an Access-Request that has 'Service-Type = Authorize-Only' but no User-Password attribute. In your case you could catch these requests with a specific Handler and then run the 'authorizeSQL' AuthBy only within this new Handler.

When you know you can handle 'Service-Type = Authorize-Only' TACACS+ derived access requests, you can enable FarmSize on the frontend.

When you do that, you can have parallel workers accepting and processing TACACS+ requests. It's likely that some related acccess and authorisation requests are picked by different workers, but when that happens, the worker can authorise the TACACS+ authorization requests separately.

FRONTEND:

<ServerTACACSPLUS>
AuthorizationTimeout 86400
Key %{GlobalVar:FailbackKey}
Port 49
AddToRequest NAS-Identifier=TACACS
GroupMemberAttr X-MY-TACACSGROUP
AllowAuthorizeOnly 1

Goodies tacacsplus example shows how to handle requests triggered by AllowAuthorizeOnly. You may want to do some initial testing with the goodies example to see how it behaves when FarmSize is > 1.


BACKEND:

# Handlers
<Handler Request-Type=Accounting-Request>
Identifier TacacsAcct
AuthBy AlwaysAccept
AcctLogFileName %L/acct-tacacs.log
AcctLogFileFormatHook file:"%D/hooks/acctlogformat-tacacs.hook"
</Handler>

Here's how to catch requests that are triggered when the same frontend worker does not process both TACACS+ authentication and subsequent authorisation request:

<Handler Service-Type=Authorize-Only>
   # Identifier, AuthByPolicy, etc.
   # No AuthBy authenticateSQL - there's no User-Password in the reuqest
   AuthBy authorizeSQL
</Handler>

<Handler>
Identifier SQLtacacs
AuthByPolicy ContinueWhileAccept
AuthBy authenticateSQL
AuthBy authorizeSQL
AuthBy InternalReply
RejectHasReason
AuthLog authlog-tacacs
</Handler>

# end
Another option for frontend + FarmSize is to run some type of TCP load balancer and separate Radiator instances listening to different TACACS+ ports. HAProxy could work, but I'd first see about FarmSize on frontend with backend set so that it can do authorize only requests.

Please let us know if the above helps.

Thanks,
Heikki

--
Heikki Vatiainen <h...@open.com.au>

Radiator: the most portable, flexible and configurable RADIUS server
anywhere. SQL, proxy, DBM, files, LDAP, TACACS+, PAM, Active Directory,
EAP, TLS, TTLS, PEAP, WiMAX, RSA, Vasco, Yubikey, HOTP, TOTP,
DIAMETER etc. Full source on Unix, Windows, MacOSX, Solaris, VMS, etc.
_______________________________________________
radiator mailing list
radiator@lists.open.com.au
https://lists.open.com.au/mailman/listinfo/radiator

Reply via email to