On Friday, October 18, 2002, at 02:37 PM, Jim Wilcoxson wrote:
What is the point of having the DB module load, if it can't connect to the DB server?
I use the external database driver interface to wire up AOLservers to things that aren't really databases, so I can take advantage of pooling. Some of these external data sources have transient connectivity problems, and I've engineered my AOLserver app to tolerate the connection outages. I can detect a bad connection from the gethandle or subsequent failure, and my app does the right thing. There's no reason to force a hard failure at the driver level.
So we had to hack something up in our TCL startup routines to actually *DO* a database operation during the AS boot to make sure the database was accessible.
One man's hack is another man's version of robustness engineering. I think this was the correct way to handle the situation.
And the retry code in the DB mechanism is not exactly robust: if there is a pooled connection to the DB server and the connection dies, AS gives back weird errors to ns_db gethandle until the pool is bounced, which doesn't happen automatically.
I've never had that happen unless there were bugs in the drivers, and I'd prefer to fix those.
The other thing to keep in mind is that if a particular server fails hard, then load-balancers etc. handle that much better than when a server returns "Sorry, we could not get a DB handle for your request".
I'm not saying you can't make your server fail hard, I'm saying that the driver author should be the one to make the call. I do not regard the AOLserver process as a resource the driver is entitled to control; the driver is a guest in the AOLserver's process, and it should not have the authority to tear down the process unless it determines that the process itself is irrecoverably flawed; otherwise, report failures and let the app decide whether the process should be terminated.