The issue described does smell like a possible connection leak, a blocker in a high-load environment.
For those who know mifos code base better: is any connection pool used? I notice c3p0 is part of the lib, but I can't see any config that points to its usage. A couple suggestions to diagnose this type of problem in general. 1. If some connection pool is used: some code can be added to the unit test (maybe the base MifosTestCase setup/teardown) to ensure there is no connection leak. If the unit test coverage is great, that can help to pindown to the root cause. 2. On production servers, add some logging of all database connection acquisition / release. The idea is that someone can parse the log files to find out the scenario where some connection is not released. One way to do it is to use a logging wrapper around the mysql jdbc driver. http://rkbloom.net/logdriver/ Then, in mifos log, a normal HTTP request should show up with pattern indicating that each connection obtained is released. If some request indicates that connection is obtained but not released, one can then at least narrow down the problem areas based on the nature of the request. 2007-11-14 20:16:48,941 [http-processor8080_13] INFO org.mifo.SomeClass [some msg to indicate what request the thread is serving, maybe the request's URL] ... 2007-11-14 20:16:48,941 [http-processor8080_13] DEBUG net.rkbloom.logdriver.LogConnection connection obtained. ... 2007-11-14 20:16:48,941 [http-processor8080_13] DEBUG net.rkbloom.logdriver.LogConnection connection released. ... On Nov 14, 2007 8:46 PM, Amy Bensinger (Contractor) <[EMAIL PROTECTED]> wrote: > > > > > I'm sure we can put something into place in addition to the existing > monitoring. --aB > > > > ________________________________ > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tom > Bostelmann > Sent: Thursday, November 15, 2007 2:04 AM > To: Developer > Subject: Re: [Mifos-developer] Fw: Mifos production issues and > likelyhibernateconnection pooling > > > > > > I had similar thoughts when I first saw the email thread on this issue. > However, as people discussed the problem it appeared to turn into a Session > timeout problem and then quickly became a non-issue. So I haven't given it > much thought since. > > What James is describing gives us a sense for how large this issue could > be, so I agree that we should spend some time analyzing this. > > In general, we should be monitoring the number of open connections to the > database at GK. There's probably a mysqladmin option that provides a count > of the open connections. If, in fact, we find that there are an abnormal > number of open connections at one time we should then proceed to analyzing > the code. > > For now, is there someone who can set up something at GK to record the > number of open connections to mysql throughout the day? > > > > > On Nov 14, 2007 11:59 AM, James Dailey < [EMAIL PROTECTED]> wrote: > > Amy - > > I'm not your guy for patches, but I would suggest that this deserves a P1 > ranking and more discussion here - if I am right with my lense on this, it > is likely a fair amount of work to puzzle out what is happening and then to > correct it. The current ranking is P2 > https://mifos.dev.java.net/issues/show_bug.cgi?id=1514 > > Again, if I am right with understanding this and the current priorities of > mifos, it is one of those design issues that only expresses itself > intermittently to the user or superuser, and gets worst over time. That is, > the more that you try to use Mifos in a large institutional setting, or the > longer that you run mifos continuously, the more this error exhibits, which > kills all functionality since it causes a freeze. It also potentially raises > the *specter* of data corruption if you end up abending data writes on a > regular basis. But to the user, it is a minor inconvenience - which is a > classic mismatch on plumbing issues. So, that would be my first question to > this community list. > > Some things I could note: > > Van was working on the persistence layer earlier > http://sourceforge.net/mailarchive/message.php?msg_id=46DF94D6.6080704%40gmail.com > Quote: > >" > As you have found, Mifos uses the idea of a Persistence layer. These > >Persistence classes in Mifos are not yet implemented in as consistent a > > >way as we would like, but the idea is that the creation, deletion, > > >update and lookup/retrieval of a given functional group of objects > > >should be encapsulated by methods defined on a Persistence class. For > > >reference, these Persistence classes are close to the "Repository" > > >design pattern as described by Fowler > > >(http://www.martinfowler.com/eaaCatalog/repository.html) and others. The > > >goal is to make these Persistence classes follow the Repository design > > >pattern-- there is still plenty of work to do in order to achieve this > goal." > # > > So, that's one area of major work in plumbing. But also, how are the > connection pools handled in Mifos? > > Looking at > http://sourceforge.net/mailarchive/message.php?msg_id=9DD845C1ED0D5D40B4B56DF5A4B1EB0EABB40C%40gfmail.gfusa.org > I noted that Jim Kingdon was working on removing the JNDI dependency with > Terry Wong as late as October 2006. > > Finally, I note that I find only one mention of a "finally" block in the > mifos code (see > /mifosrepo/trunk/DevSpace/WorkSpace/src/org/mifos/framework/security/util/LoginFilter.java > > finally{ > HibernateUtil.closeSession(); > } > > which is recommended as a general practice on the reading I have found on > this subject as follows: > > "When using connection pooling, it is important to remember that a chunk of > bad code that neglects to return connections can starve the rest of the > application, causing it to eventually run out of connections and hang > (potentially failing nowhere near the actual problem). To test for this, > set the > maximum connections in your pool to a small number (as low as 1), and use > tools > like p6spy and IronTrack SQL (described above) to look for statements that > fail > to close. This problem can be avoided by always using a finally block > to close your connection..." > > > > > I defer to those now on the list to comment - Sam? William? Terry? Jim? > Tom? Alija? > > - James > > > > > ----- Forwarded Message ---- > From: Amy Bensinger (Contractor) <[EMAIL PROTECTED]> > > > > To: j dailey < [EMAIL PROTECTED]>; Developer > <[email protected]> > Sent: Tuesday, November 13, 2007 9:29:14 PM > Subject: RE: [Mifos-developer] Mifos production issues and likely > hibernateconnection pooling > > Hi, all. James, thanks for your response. > > Yes, the issue is bothersome, but it does *not* occur with great > frequency and has been a known issue for a while. GK has a workaround > for the issue in place. Naganand and I wanted to throw it out to the > community to see what options were available. As GK has been up and > running on all branches, you'll see more participation on the listserv. > > Don't worry--high priority issues for active deployments are quite > naturally handled with due concern, and in fact I am in Bangalore for > several months to generally pester Naganand, document how Mifos is > being > used in the field, and identify pain points to be fixed ASAP, as well > as > supporting other deployments in India. (If you are curious about what > the issues are, check the issue tracker with keyword GrameenKoota, the > Deployment Issues page, or the Ongoing Issues page). > http://mifos.org/developers/how-mifos-is-used gives a short picture of > how it's really being used (though I suspect you know some of it > already > ;) ) > > > Aliya has also looked into the issue already and likely a defect will > be > filed by Grameen Koota. > > Thanks again for noticing and please feel free to work on the patch ;). > > > > > Cheers! > > --Amy > > > > > -----Original Message----- > From: [EMAIL PROTECTED] > > > [mailto: [EMAIL PROTECTED] On Behalf Of j > dailey > Sent: Wednesday, November 14, 2007 9:39 AM > To: [email protected] > > > Subject: [Mifos-developer] Mifos production issues and likely > hibernateconnection pooling > > > > Yesterday, Naganand, who is running Mifos in production at Grameen > Koota > posted the plea below - basically GF's largest (known) Mifos customer. > It sounded to me like a pretty big deal, requiring some sort of > immediate response at a technical level. Sorry to jump on toes, but > was > there a response off listserv? > > The likely problem here is that the database connections are being > opened and not closed properly (each query is left open as a separate > connection ) - so someone with good hibernate chops might need to be > drafted into this (Sam Lee - I see you are active now and I recall you > had this skill?). Maybe this link would be helpful? > http://www.informit.com/articles/article.aspx?p=353736&seqNum=4 > > At a minimum, it looks as if the stress testing regime and unit tests > around this are weak. > > (Hiya Naganand - hope you are doing well. ) > > - James > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 12 Nov 2007 05:42:09 -0500 > From: "Amy Bensinger (Contractor)" < [EMAIL PROTECTED]> > Subject: [Mifos-developer] FW: Information on database performance and > optimizations? > To: "Developer" < [email protected]>, "Mifos > functional discussions" <[EMAIL PROTECTED]> > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset="us-ascii" > > Forwarding a post on behalf of Nagananda... > > From: Naganand [mailto:[EMAIL PROTECTED] > Sent: 12 November,2007 04:00 PM > To: '[EMAIL PROTECTED] ' > Subject: > > Hi all, > > Greetings from Grameen Koota! > > We have an issue on our live server. Can anybody help us! > > We have 44 branches running live on the server and most of them do data > entry using low speed internet connections. The connection is sometime > strong and sometime week and breaking. > > It so happens that sometimes mysql on our database server overshoots > the > number of users which is currently defined as 150 [though the number of > branches are only 44 + another 10 users max] and the server hangs and > stops responding. If I increase the users limit then the server starts > responding slowly. > > Also sometimes the branches load reports and there is a connection > breakage inbetween. Then the users reconnect and reload the reports. > But > in the server the old queries does not end and mysql hangs. We need to > restart mysql or kill the queries 1 by 1 and then the server works > faster. > > Can anybody help us know why does the no. of users go out of limit? > > And why do the queries hang instead of ending when the user is > disconnected from the net? > > Regards, > > Naganand > > AGM [Information Technology Department] > > > > Grameen Koota, > > Avalahalli, Anjanapura Post > > Bangalore- 560062 > > Email: [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > Phone: 080-28436838, > > Mob: 9341940803 > > Website: www.grameenkoota.org > > > > > > > > > > ________________________________________________________________________ > ____________ > Be a better sports nut! Let your teams follow you > with Yahoo Mobile. Try it now. > http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
