[uportal-dev] uPortal database connection pool size
Answering the question below on the uportal-user list got me thinking (a dangerous thing indeed ... :-) ). Currently all 3 of the uPortal DB connection pool sizes defined in datasourceContext.xml are all set to the same max value (of 75 by default). In glancing at the code I am thinking that the raw events DB pool and the aggregation events DB pool are running on timed threads and only use 1 DB connection each (see https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/java/org/jasig/portal/events/handlers/QueueingEventHandler.java#L41 and https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/resources/properties/contexts/schedulerContext.xml#L73). They can both have a smaller maxActive value to limit the exposure of an error somehow consuming large numbers of DB connections and impacting uPortal and portlets (via consuming too many DB connections on the DB server). I was thinking of setting their maxActive value to 5 to allow simultaneous threads for saving, purging, and querying. Does anyone see a problem with this strategy? UWMadison or someone with an active system can you glance at the DB MBeans AggrEventsDB and RawEventsDB in uPortal/Datasource and see if the NumActive + NumIdle are even close to 5 on your system? (unfortunately without monitoring tools I don't see how you'd find out what the max # of connections ever made was). Thanks in advance for your insights and thoughts. James Wennmacher - Unicon 480.558.2420 Forwarded Message Subject:Re: [uportal-user] Increase database connection pool size Date: Tue, 02 Dec 2014 11:02:02 -0700 From: James Wennmacher jwennmac...@unicon.net To: uportal-u...@lists.jasig.org Database connection counts are defined in uportal-war/src/main/resources/properties/contexts/datasourceContext.xml. uPortal uses the DB connections for a fairly brief period of time. The message 'none available[size:75; busy:0; idle:0; lastwait:5000]' plus your comment about leaving it overnight makes me wonder if somehow the connections are being lost and not reclaimed. I suggest: 1. Insure that the load test is not hitting servers too heavily; e.g. load is distributed evenly. I could see running out of DB connections happening if a server gets hammered (though the connections should be freed up at some point later). Does it happen primarily to one or two servers and not all of them? 2. Try adding the following properties to the basePooledDataSource bean in datasourceContext.xml: property name=logAbandoned value=true / property name=numTestsPerEvictionRun value=5 / This may not resolve the issue, but perhaps the logging will provide a clue to what's going on. However it is likely the additional logging will not trigger. The property minEvictableIdleTimeMillis is supposed to release a connection after it has been idle for the specified number of milliseconds, and the properties abandonWhenPercentageFull, removeAbandoned, and removeAbandonedTimeout which are specified are supposed to clean up abandon connections (allocated but not used in removeAbandonedTimeout seconds when a new connection is requested but none are available). However in a load test scenario, especially one where a server is taxed very heavily, the removeAbandonedTimeout value may be too high (value is 300 sec) if connections are heavily used so no connections may be considered abandoned and harvested during the test. However I wouldn't change removeAbandonedTimeout just yet. After the test completes however there may be some useful log messages if some connections were consumed and not released. If nothing else 5 minutes after the test completes you should be able to log onto the server even if all connections are consumed since it should consider at least some of the connections as abandoned and eligible for harvesting. However your comment about leaving the system overnight and the issue still exists makes me think the abandon connections will not be harvested. Still worth trying logAbandoned to see if it provides more info. 3. Are there other DB connection error messages? The ones you mentioned are for event aggregation (runs periodically to aggregate and purge raw portal activity event data) and for jgroups (used for distributed cache management to allow uPortal nodes to notify other uPortal nodes about cache replication or invalidation). Were there any for uPortal activity not having a database connection? 4. It would be great to get more information so we can try to fix the issue of the connections not being released. Even when fully consumed, the connections should release after a period of time (after being idle for minEvictableIdleTimeMillis milliseconds, or 5 minutes at the latest per removeAbandonedTimeout property when attempting to get a new connection and none are available). When this situation occurs are you able to look at the DB server and
Re: [uportal-dev] uPortal database connection pool size
Random collection of thoughts on this topic: * Raw events would need more than one connection. There are 3 major interactions (might be more) with UP_RAW_EVENTS: Insert for new events, read/update for aggregation, and purging. * This pool is per node, so if you are clustering (like our 4 nodes) this can get you into trouble if you have limits on your concurrent database connections. * It looks like we use the default pool size here on MyUW. I don't have access to the admin interface in production. * The only time we had database connection issues is when we had a database hiccup (single point of failure) and a bunch of requests queued up. * If we do modifications, I would also suggest upping the wait for connection timeout. Thanks, Tim Levett tim.levettATwisc.edu MyUW-Infrastructure From: bounce-37985837-70367...@lists.wisc.edu bounce-37985837-70367...@lists.wisc.edu on behalf of James Wennmacher jwennmac...@unicon.net Sent: Tuesday, December 2, 2014 1:02 PM To: uportal-dev@lists.ja-sig.org Subject: [uportal-dev] uPortal database connection pool size Answering the question below on the uportal-user list got me thinking (a dangerous thing indeed ... :-) ). Currently all 3 of the uPortal DB connection pool sizes defined in datasourceContext.xml are all set to the same max value (of 75 by default). In glancing at the code I am thinking that the raw events DB pool and the aggregation events DB pool are running on timed threads and only use 1 DB connection each (see https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/java/org/jasig/portal/events/handlers/QueueingEventHandler.java#L41 and https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/resources/properties/contexts/schedulerContext.xml#L73). They can both have a smaller maxActive value to limit the exposure of an error somehow consuming large numbers of DB connections and impacting uPortal and portlets (via consuming too many DB connections on the DB server). I was thinking of setting their maxActive value to 5 to allow simultaneous threads for saving, purging, and querying. Does anyone see a problem with this strategy? UWMadison or someone with an active system can you glance at the DB MBeans AggrEventsDB and RawEventsDB in uPortal/Datasource and see if the NumActive + NumIdle are even close to 5 on your system? (unfortunately without monitoring tools I don't see how you'd find out what the max # of connections ever made was). Thanks in advance for your insights and thoughts. James Wennmacher - Unicon 480.558.2420 Forwarded Message Subject:Re: [uportal-user] Increase database connection pool size Date: Tue, 02 Dec 2014 11:02:02 -0700 From: James Wennmacher jwennmac...@unicon.netmailto:jwennmac...@unicon.net To: uportal-u...@lists.jasig.orgmailto:uportal-u...@lists.jasig.org Database connection counts are defined in uportal-war/src/main/resources/properties/contexts/datasourceContext.xml. uPortal uses the DB connections for a fairly brief period of time. The message 'none available[size:75; busy:0; idle:0; lastwait:5000]' plus your comment about leaving it overnight makes me wonder if somehow the connections are being lost and not reclaimed. I suggest: 1. Insure that the load test is not hitting servers too heavily; e.g. load is distributed evenly. I could see running out of DB connections happening if a server gets hammered (though the connections should be freed up at some point later). Does it happen primarily to one or two servers and not all of them? 2. Try adding the following properties to the basePooledDataSource bean in datasourceContext.xml: property name=logAbandoned value=true / property name=numTestsPerEvictionRun value=5 / This may not resolve the issue, but perhaps the logging will provide a clue to what's going on. However it is likely the additional logging will not trigger. The property minEvictableIdleTimeMillis is supposed to release a connection after it has been idle for the specified number of milliseconds, and the properties abandonWhenPercentageFull, removeAbandoned, and removeAbandonedTimeout which are specified are supposed to clean up abandon connections (allocated but not used in removeAbandonedTimeout seconds when a new connection is requested but none are available). However in a load test scenario, especially one where a server is taxed very heavily, the removeAbandonedTimeout value may be too high (value is 300 sec) if connections are heavily used so no connections may be considered abandoned and harvested during the test. However I wouldn't change removeAbandonedTimeout just yet. After the test completes however there may be some useful log messages if some connections were consumed and not released. If nothing else 5 minutes after the test completes you should be able to log onto the server even if all
Re: [uportal-dev] uPortal database connection pool size
Sounds reasonable on the maxWait increase. How about 10 sec? Written up as https://issues.jasig.org/browse/UP-4325. I'll try to get to it later this month to give others a chance to comment on it. James Wennmacher - Unicon 480.558.2420 On 12/02/2014 12:38 PM, Tim Levett wrote: Random collection of thoughts on this topic: * Raw events would need more than one connection. Thereare3 major interactions (might be more) with UP_RAW_EVENTS: Insert for new events,read/update for aggregation, and purging. * This pool is pernode,so if you are clustering (like our 4 nodes) this can get you into trouble if you have limits on your concurrentdatabase connections. * It looks like we use the default pool size here on MyUW.I don't have access to theadmin interface in production. * The only time we had database connection issues is when we had a databasehiccup(single point of failure) and a bunch of requests queued up. * If we do modifications, I would also suggest upping the wait for connection timeout. Thanks, Tim Levett tim.levettATwisc.edu MyUW-Infrastructure *From:* bounce-37985837-70367...@lists.wisc.edu bounce-37985837-70367...@lists.wisc.edu on behalf of James Wennmacher jwennmac...@unicon.net *Sent:* Tuesday, December 2, 2014 1:02 PM *To:* uportal-dev@lists.ja-sig.org *Subject:* [uportal-dev] uPortal database connection pool size Answering the question below on the uportal-user list got me thinking (a dangerous thing indeed ... :-) ). Currently all 3 of the uPortal DB connection pool sizes defined in datasourceContext.xml are all set to the same max value (of 75 by default). In glancing at the code I am thinking that the raw events DB pool and the aggregation events DB pool are running on timed threads and only use 1 DB connection each (see https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/java/org/jasig/portal/events/handlers/QueueingEventHandler.java#L41 and https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/resources/properties/contexts/schedulerContext.xml#L73). They can both have a smaller maxActive value to limit the exposure of an error somehow consuming large numbers of DB connections and impacting uPortal and portlets (via consuming too many DB connections on the DB server). I was thinking of setting their maxActive value to 5 to allow simultaneous threads for saving, purging, and querying. Does anyone see a problem with this strategy? UWMadison or someone with an active system can you glance at the DB MBeans AggrEventsDB and RawEventsDB in uPortal/Datasource and see if the NumActive + NumIdle are even close to 5 on your system? (unfortunately without monitoring tools I don't see how you'd find out what the max # of connections ever made was). Thanks in advance for your insights and thoughts. James Wennmacher - Unicon 480.558.2420 Forwarded Message Subject: Re: [uportal-user] Increase database connection pool size Date: Tue, 02 Dec 2014 11:02:02 -0700 From: James Wennmacher jwennmac...@unicon.net To: uportal-u...@lists.jasig.org Database connection counts are defined in uportal-war/src/main/resources/properties/contexts/datasourceContext.xml. uPortal uses the DB connections for a fairly brief period of time. The message 'none available[size:75; busy:0; idle:0; lastwait:5000]' plus your comment about leaving it overnight makes me wonder if somehow the connections are being lost and not reclaimed. I suggest: 1. Insure that the load test is not hitting servers too heavily; e.g. load is distributed evenly. I could see running out of DB connections happening if a server gets hammered (though the connections should be freed up at some point later). Does it happen primarily to one or two servers and not all of them? 2. Try adding the following properties to the basePooledDataSource bean in datasourceContext.xml: property name=logAbandoned value=true / property name=numTestsPerEvictionRun value=5 / This may not resolve the issue, but perhaps the logging will provide a clue to what's going on. However it is likely the additional logging will not trigger. The property minEvictableIdleTimeMillis is supposed to release a connection after it has been idle for the specified number of milliseconds, and the properties abandonWhenPercentageFull, removeAbandoned, and removeAbandonedTimeout which are specified are supposed to clean up abandon connections (allocated but not used in removeAbandonedTimeout seconds when a new connection is requested but none are available). However in a load test scenario, especially one where a server is taxed very heavily, the removeAbandonedTimeout value may be too high (value is 300 sec) if connections are heavily used so