Hello list, I'm having a problem with a production system.
I have an application that runs 7x24. It is multithreaded, and every thread
has its own connection to the database.
We have two types of devices connected to each thread, human operated PCs
for billing purposes, and rabbit microcontrollers controlling some
electronic devices.

The problem we are having is that once a month approximately, all the
microcontrollers attached to their threads (its a common pool of threads),
stop responding. Having a closer look at the logs, we found that these
threads are stalled at a SELECT NOW() immediately after a BEGIN TRANSACTION.
And then the thread stops responding, since it is waiting for that sql
command to finish.

The curious part is that all PCs still work ok, even though they also
execute SELECT NOW().

This log represents the session for one PC as an example.

02/09/2010 09:26:32.593 | 192.168.203.104:2255 | Thread 17: (CAJA
4...@parkingb->Exterior)Thread 17: [TRANSACTION BEGIN]
02/09/2010 09:26:32.593 | (17) | ROLLBACK
02/09/2010 09:26:32.593 | (17) | BEGIN TRANSACTION
02/09/2010 09:26:32.593 | (17) | SELECT NOW()
02/09/2010 09:26:32.593 | 192.168.203.104:2255 | Thread 17: (CAJA
4...@parkingb->Exterior)0 ms (errorCode=[200] bytesRead=[17]
bytesWritten=[3])

This log represents the session for one thread attached to a device.

02/09/2010 09:15:01.265 | 192.168.203.52:1127 | Thread  8: (SAL2ABJ
t...@parking B->Exterior)Thread 8: [EVENT AUTH TR 005001040719 02/09/2010
09:15:01 1]
02/09/2010 09:15:01.265 | (8) | BEGIN TRANSACTION
02/09/2010 09:15:01.265 | (8) | SELECT NOW()

and the thread never continues, it never comes by the select now().

The only difference from the database point of view, is that PCs
transactions, since they are much more complex, usually start with a
ROLLBACK (arguable, it will be eventually removed) and for devices they
usually just start with a COMMIT.

Do you have any idea on what could be causing a SELECT NOW() to never come
back?
I'm dazzled.

The problem gets solved restarting the server application (and by doing so,
all connections to the database are dropped, and after that it doesnt
present the problem anymore for at least a month). It has happened
approximately 4 or 5 times already.

Thank you,
Eduardo.

Reply via email to