I had some more time to play with this; it seems to be related to retiring old threads, not actual problem on the home server. Some new observations below.
> Alan DeKok wrote on Aug28, 2012: > Brian Julin wrote: > > I'm currently hunting a problem that causes a recent checkout of FR3.0 > > to abort but which does not seem to be affecting an older revision > > (April 8th or so) > > of FR3.0 on another box. I do have a couple small in-house patches applied > > but they should probably not be relevant. > It's always worth checking. OK, I've tested with a recent clean clone and it still has the problem. > > request_dequeue found request 0x13d5220 > > ASSERT FAILED threads.c[468]: request->magic == REQUEST_MAGIC > Which means that the request has been deleted, but is still in the > queue. That's not supposed to happen... After collecting some more debug logs, I noticed that this problem was happening too often on requests numbered around 260 to be a coincidence. It turns out this happens after a thread is marked for recycling due to having handled over 128 requests. I'm no longer positive that the home server is not responding, though it could have similar limits set up on the RADSEC connection so we cannot completely eliminate that as a possibility. It seems a lot more likely, though, that something is getting wedged up locally and the request to the home server never actually hits the wire. The server will ride over the thread recycling if it is handling requests that do not need to be proxied at the moment when the threads are marked; it only has trouble in the middle of conversations. Below is a debug log with some extra radlogs thrown in by hand. The "Reaping" lines happen in the loop that tests whether threads have handled so many connections that they should be retired. Is it normal for a thread to grab and handle requests after it has been marked for recycling? We start in the middle of an ongoing EAP conversation. Sending Access-Challenge of id 96 to 127.0.0.1 port 34912 [request dump omitted] (259) Finished request 259. Thread 2 waiting to be assigned a request Waking up in 0.3 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Deleting thread 3 Deleting thread 4 Reaping... Reaping thread 1 due to number of requests handled (130/128 handled) Reaping... Reaping thread 2 due to number of requests handled (131/128 handled) Waking up in 0.2 seconds. Thread 2 got semaphore Thread 2 handling request 260, (132 handled so far) [request dump omitted] Thread 1 got semaphore Thread 1 exiting... [unlang trace omitted] Sending Access-Request of id 238 to XXXXXXXXXXX port 2083 [request dump omitted] Thread 2 exiting... Waking up in 0.3 seconds. Waking up in 0.4 seconds. Waking up in 0.7 seconds. Waking up in 1.1 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 Waking up in 0.9 seconds. Waking up in 1.1 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 (256) Cleaning up request packet ID 213 with timestamp +11084 rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 (256) Cleaning up request packet ID 213 with timestamp +11084 Waking up in 0.2 seconds. (257) Cleaning up request packet ID 10 with timestamp +11085 Waking up in 0.2 seconds. (258) Cleaning up request packet ID 151 with timestamp +11085 Waking up in 0.2 seconds. (259) Cleaning up request packet ID 96 with timestamp +11085 Waking up in 2.2 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 Waking up in 1.1 seconds. Waking up in 3.7 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 Waking up in 2.9 seconds. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=242, length=273 Discarding duplicate request from client EDUROAM_OUT port 34912 - ID: 242 due to unfinished request 260 Waking up in 0.9 seconds. Waking up in 5.6 seconds. Waking up in 8.5 seconds. Waking up in 12.8 seconds. Waking up in 19.2 seconds. Waking up in 28.8 seconds. (260) <queue> : Cleaning up request packet ID 242 with timestamp +11086 Ready to process requests. rad_recv: Access-Request packet from host 127.0.0.1 port 34912, id=104, length=272 Threads: total/active/spare threads = 2/0/2 Threads: Spawning 2 spares Thread spawned new child 5. Total threads in pool: 3 Thread spawned new child 6. Total threads in pool: 4 Waking up in 0.3 seconds. Thread 5 waiting to be assigned a request Thread 5 got semaphore ASSERT FAILED threads.c[461]: request->magic == REQUEST_MAGIC Thread 6 waiting to be assigned a request Thread 6 got semaphore - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html