Re: Performance for many small requests
Some good advice in this thread already, but given the power of server there should be no problem serving even more requests ( as long as they are small, not bound by CPU or I/O). I'd start looking at JVM GC properties. Turn on gc logging with -Xloggc:/someplace/gclog_tomcat.txt -XX:+PrintGCDetails ( or some verbose / timestamped variant of this ) And see what your gc pauses look like, how frequent young gen / full gc are. With hundreds of requests coming in every sec, you get into situation where garbage collection pause will create a burst of requests once its done ( and there is a risk that those burst objects will get promoted to old gen! ) and cause stampede if server is near limits already. Generally You want your requests to never hit old gen memory in JDK and get collected in some magic parallel young gen GC collector ( -XX:+UseParNewGC, -XX:NewSize=??? -XX:MaxNewSize=??? -XX:SurvivorRatio=??? are worth checking / or even trying G1C in JDK7 if You feel adventurous ). As already was mentioned - finetuning JVM params is good idea, but only after checking GC logs. Only thing i recommend outright is pinning JVM heap with matching -Xmx and -Xms directives - or else JVM will keep allocating/deallocating memory from OS, potentially creating fragmentation issues in long term. And there is a question of 64bit JVM, unless you need Java heap above ~1.5G, 32bit JVM should do just fine, otherwise you are just paying huge tax in memory usage and CPU cache/TLB misses. -XX:+UseCompressedOops can help to remove some of this tax, but in my opinion using 64bit JVM with such a small heap is only needed if performance testing shows gains versus 32bit JVM. -- View this message in context: http://old.nabble.com/Performance-for-many-small-requests-tp32372424p32392622.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: Darius D. [mailto:darius@gmail.com] Subject: Re: Performance for many small requests in my opinion using 64bit JVM with such a small heap is only needed if performance testing shows gains versus 32bit JVM. The main advantage of using a 64-bit JVM is the increased number of registers available in the x86-64 architecture, which can result in vastly reduced memory references. Whether or not that's important to overall performance is highly dependent on the application, of course. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
n828cl wrote: From: Darius D. [mailto:darius@gmail.com] Subject: Re: Performance for many small requests in my opinion using 64bit JVM with such a small heap is only needed if performance testing shows gains versus 32bit JVM. The main advantage of using a 64-bit JVM is the increased number of registers available in the x86-64 architecture, which can result in vastly reduced memory references. Whether or not that's important to overall performance is highly dependent on the application, of course. Yeah, but I'd err on 32bit JVM side :) Gains from more registers are very specific, but penalty from increased cache/TLB misses is big, and if you start hitting hard page faults ( that would have been avoidable due to lower heap size with 32bit JVM ) - even one of those will erase all gains :) There are reasons why Linux and other stuff is shipped compiled optimized for size, not with some fancy -O666 option. -- View this message in context: http://old.nabble.com/Performance-for-many-small-requests-tp32372424p32392870.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 9/3/2011 1:15 PM, Darius D. wrote: n828cl wrote: From: Darius D. [mailto:darius@gmail.com] Subject: Re: Performance for many small requests in my opinion using 64bit JVM with such a small heap is only needed if performance testing shows gains versus 32bit JVM. The main advantage of using a 64-bit JVM is the increased number of registers available in the x86-64 architecture, which can result in vastly reduced memory references. Whether or not that's important to overall performance is highly dependent on the application, of course. Yeah, but I'd err on 32bit JVM side :) Gains from more registers are very specific, but penalty from increased cache/TLB misses is big, and if you start hitting hard page faults ( that would have been avoidable due to lower heap size with 32bit JVM ) - even one of those will erase all gains :) Then why shouldn't I just double my heap size? Wouldn't that eliminate the risk of increased cache misses? I was just using the settings from my 32-bit installation, but have plenty of RAM to allow me to increase the memory settings if that would help. Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
David Kerber wrote: On 9/3/2011 1:15 PM, Darius D. wrote: n828cl wrote: From: Darius D. [mailto:darius@gmail.com] Subject: Re: Performance for many small requests in my opinion using 64bit JVM with such a small heap is only needed if performance testing shows gains versus 32bit JVM. The main advantage of using a 64-bit JVM is the increased number of registers available in the x86-64 architecture, which can result in vastly reduced memory references. Whether or not that's important to overall performance is highly dependent on the application, of course. Yeah, but I'd err on 32bit JVM side :) Gains from more registers are very specific, but penalty from increased cache/TLB misses is big, and if you start hitting hard page faults ( that would have been avoidable due to lower heap size with 32bit JVM ) - even one of those will erase all gains :) Then why shouldn't I just double my heap size? Wouldn't that eliminate the risk of increased cache misses? I was just using the settings from my 32-bit installation, but have plenty of RAM to allow me to increase the memory settings if that would help. Dave Umm, sorry, it seems that 64 vs 32bit discussion has thrown this thread of track. In your case it probably makes little difference. I'd definately start by profiling and looking at gc logs. As a side note - (CPU)cache/TLB misses have nothing to do with heap size. Too big heap size can be as bad as too low ( by stealing memory from OS that could have been used for file caches and other apps and increasing GC pauses ). -- View this message in context: http://old.nabble.com/Performance-for-many-small-requests-tp32372424p32392937.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: David Kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Then why shouldn't I just double my heap size? Wouldn't that eliminate the risk of increased cache misses? As Darius stated, this part of the discussion is probably completely irrelevant to any performance issues you have. Regardless, doubling the heap size will likely _increase_ the cache misses, since you now have a larger target space being accessed through a fixed size cache space. This level of refinement is a real juggling act; unless your cores are staying very busy, it's unlikely to have any measurable effect. You need to collect more data so you can start ruling out causes, and GC information is probably the easiest to start with. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
-Original Message- From: Darius D. [mailto:darius@gmail.com] Sent: Saturday, September 03, 2011 1:36 PM As a side note - (CPU)cache/TLB misses have nothing to do with heap size. Too big heap size can be as bad as too low ( by stealing memory from OS that could have been used for file caches and other apps and increasing GC pauses ). The Linux JVM has a nice option -XX:+UseLargePages to help avoid TLB misses on heap accesses. It's good for modest gains, and has the side effect of locking the heap into memory. I tend to use it on systems with large heaps. Not much you can do about cache misses though, besides getting processors with more cache. -Jeff - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
Jeff Sturm wrote: The Linux JVM has a nice option -XX:+UseLargePages to help avoid TLB misses on heap accesses. It's good for modest gains, and has the side effect of locking the heap into memory. I tend to use it on systems with large heaps. Not much you can do about cache misses though, besides getting processors with more cache. Yeah, Large(Huge) pages is big, free 10-15% perf if app is memory heavy. And as a bonus you also reduce the size of kernel page translation tables ( works wonders if you are using for example Oracle that is using shared mem for instances, savings can be up to gigabytes on big systems ). Even greater news is that recent Linux kernel versions ( i think 2.6.38 and above ) have so called Transparent Huge Pages - that basically enable them with some magic for all processes, giving 95% of HugePages benefits without LargePages penalties (heap locking/pinning and app specific config). So users of virtualisation, databases and JVMs with large heaps can rejoice :) -- View this message in context: http://old.nabble.com/Performance-for-many-small-requests-tp32372424p32393854.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 8/31/2011 12:25 PM, Tony Anecito wrote: Hi David, You need to not only look at the container but it's configuration and the jre that is being used. There have been a lot of improvements in all areas for performance. Also, understand the servlet model seems developers have completely forgotten about it and how important it is. Also, I always revaluate my design/implementation every 6 months or so and make changes based on lessons learned. Also, do not be afraid to try something new :] I'm running JRE 6_27, in server mode, on windows server 2008 with 4 cores, 8GB RAM. TC 7.0.20, downloaded yesterday. I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests, 150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. Would one of the thread pools help this situation? Server port=8005 shutdown=SHUTDOWN Listener className=org.apache.catalina.core.AprLifecycleListener SSLEngine=on / Listener className=org.apache.catalina.core.JasperListener / Listener className=org.apache.catalina.core.JreMemoryLeakPreventionListener / Listener className=org.apache.catalina.mbeans.GlobalResourcesLifecycleListener / Listener className=org.apache.catalina.core.ThreadLocalLeakPreventionListener / GlobalNamingResources Resource name=UserDatabase auth=Container type=org.apache.catalina.UserDatabase description=User database that can be updated and saved factory=org.apache.catalina.users.MemoryUserDatabaseFactory pathname=conf/tomcat-users.xml / /GlobalNamingResources Service name=Catalina !-- Executor name=tomcatThreadPool namePrefix=catalina-exec- maxThreads=300 minSpareThreads=4/ -- Connector port=1024 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=600 acceptCount=100 minSpareThreads=10 socketBuffer=16384 / !-- Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=1 redirectPort=8443 / -- Connector port=8009 protocol=AJP/1.3 redirectPort=8443 / Engine name=Catalina defaultHost=localhost Realm className=org.apache.catalina.realm.LockOutRealm Realm className=org.apache.catalina.realm.UserDatabaseRealm resourceName=UserDatabase/ /Realm Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true /Host /Engine /Service /Server - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
What is your current response time and what did you have before? Are you using 64-bit java or 32-bit? What is your heap settings? Are you doing web services for these requests oris this straight html? Regards, -Tony --- On Thu, 9/1/11, David kerber dcker...@verizon.net wrote: From: David kerber dcker...@verizon.net Subject: Re: Performance for many small requests To: Tomcat Users List users@tomcat.apache.org Date: Thursday, September 1, 2011, 8:40 AM On 8/31/2011 12:25 PM, Tony Anecito wrote: Hi David, You need to not only look at the container but it's configuration and the jre that is being used. There have been a lot of improvements in all areas for performance. Also, understand the servlet model seems developers have completely forgotten about it and how important it is. Also, I always revaluate my design/implementation every 6 months or so and make changes based on lessons learned. Also, do not be afraid to try something new :] I'm running JRE 6_27, in server mode, on windows server 2008 with 4 cores, 8GB RAM. TC 7.0.20, downloaded yesterday. I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests, 150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. Would one of the thread pools help this situation? Server port=8005 shutdown=SHUTDOWN Listener className=org.apache.catalina.core.AprLifecycleListener SSLEngine=on / Listener className=org.apache.catalina.core.JasperListener / Listener className=org.apache.catalina.core.JreMemoryLeakPreventionListener / Listener className=org.apache.catalina.mbeans.GlobalResourcesLifecycleListener / Listener className=org.apache.catalina.core.ThreadLocalLeakPreventionListener / GlobalNamingResources Resource name=UserDatabase auth=Container type=org.apache.catalina.UserDatabase description=User database that can be updated and saved factory=org.apache.catalina.users.MemoryUserDatabaseFactory pathname=conf/tomcat-users.xml / /GlobalNamingResources Service name=Catalina !-- Executor name=tomcatThreadPool namePrefix=catalina-exec- maxThreads=300 minSpareThreads=4/ -- Connector port=1024 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=600 acceptCount=100 minSpareThreads=10 socketBuffer=16384 / !-- Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=1 redirectPort=8443 / -- Connector port=8009 protocol=AJP/1.3 redirectPort=8443 / Engine name=Catalina defaultHost=localhost Realm className=org.apache.catalina.realm.LockOutRealm Realm className=org.apache.catalina.realm.UserDatabaseRealm resourceName=UserDatabase/ /Realm Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true /Host /Engine /Service /Server - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 01/09/2011 15:40, David kerber wrote: On 8/31/2011 12:25 PM, Tony Anecito wrote: Hi David, You need to not only look at the container but it's configuration and the jre that is being used. There have been a lot of improvements in all areas for performance. Also, understand the servlet model seems developers have completely forgotten about it and how important it is. Also, I always revaluate my design/implementation every 6 months or so and make changes based on lessons learned. Also, do not be afraid to try something new :] I'm running JRE 6_27, in server mode, on windows server 2008 with 4 cores, 8GB RAM. TC 7.0.20, downloaded yesterday. I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests, 150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. I think the short answer is use a profiler. If your code is the problem, fix it. If Tomcat is the problem, tell us where and we'll fix it. Would one of the thread pools help this situation? Unlikely. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 9/1/2011 11:13 AM, Tony Anecito wrote: What is your current response time and what did you have before? My issue isn't response time, it's number of requests per second handled. Are you using 64-bit java or 32-bit? 64-bit. What is your heap settings? The Initial memory pool in tomcat7w is 256, and the max is 512. Task manager is only showing 145MB allocated. Do those numbers refer to the heap? If not, then I don't know where to look for that. Are you doing web services for these requests oris this straight html? Straight html post, the app does some quick integrity checks and write it to disk, then returns an ok response to the client. Regards, -Tony --- On Thu, 9/1/11, David kerberdcker...@verizon.net wrote: From: David kerberdcker...@verizon.net Subject: Re: Performance for many small requests To: Tomcat Users Listusers@tomcat.apache.org Date: Thursday, September 1, 2011, 8:40 AM On 8/31/2011 12:25 PM, Tony Anecito wrote: Hi David, You need to not only look at the container but it's configuration and the jre that is being used. There have been a lot of improvements in all areas for performance. Also, understand the servlet model seems developers have completely forgotten about it and how important it is. Also, I always revaluate my design/implementation every 6 months or so and make changes based on lessons learned. Also, do not be afraid to try something new :] I'm running JRE 6_27, in server mode, on windows server 2008 with 4 cores, 8GB RAM. TC 7.0.20, downloaded yesterday. I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests,150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. Would one of the thread pools help this situation? Server port=8005 shutdown=SHUTDOWN Listener className=org.apache.catalina.core.AprLifecycleListener SSLEngine=on / Listener className=org.apache.catalina.core.JasperListener / Listener className=org.apache.catalina.core.JreMemoryLeakPreventionListener / Listener className=org.apache.catalina.mbeans.GlobalResourcesLifecycleListener / Listener className=org.apache.catalina.core.ThreadLocalLeakPreventionListener / GlobalNamingResources Resource name=UserDatabase auth=Container type=org.apache.catalina.UserDatabase description=User database that can be updated and saved factory=org.apache.catalina.users.MemoryUserDatabaseFactory pathname=conf/tomcat-users.xml / /GlobalNamingResources Service name=Catalina !-- Executor name=tomcatThreadPool namePrefix=catalina-exec- maxThreads=300 minSpareThreads=4/ -- Connector port=1024 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=600 acceptCount=100 minSpareThreads=10 socketBuffer=16384 / !-- Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=1 redirectPort=8443 / -- Connector port=8009 protocol=AJP/1.3 redirectPort=8443 / Engine name=Catalina defaultHost=localhost Realm className=org.apache.catalina.realm.LockOutRealm Realm className=org.apache.catalina.realm.UserDatabaseRealm resourceName=UserDatabase/ /Realm Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true /Host /Engine /Service /Server - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Are you using 64-bit java or 32-bit? 64-bit. Might want to try -XX:+UseCompressedOops, since you have a small heap on a 64-bit JVM. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 9/1/2011 11:36 AM, Caldarale, Charles R wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Are you using 64-bit java or 32-bit? 64-bit. Might want to try -XX:+UseCompressedOops, since you have a small heap on a 64-bit JVM. I'll look that one up. Is there any indication from what I've said that I need a larger heap? It doesn't look like it to me. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Is there any indication from what I've said that I need a larger heap? Don't think so, but GC logging will tell you for sure. The compressed OOPs capability with a small heap should not incur any encode/decode overhead and should improve cache hit rates. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
Two things to think about in addition to the recommendation Chuck mentioned. 1. For small requests having a large new generation in the heap helps. This helps with the short term objects in the heap to get collected. 2. Lots of small disk IO may slow you down also. Remember http is synchronous and has to wait for disk IO unless it is spawned off to an independent thread. Hope this helps. -Tony --- On Thu, 9/1/11, Caldarale, Charles R chuck.caldar...@unisys.com wrote: From: Caldarale, Charles R chuck.caldar...@unisys.com Subject: RE: Performance for many small requests To: Tomcat Users List users@tomcat.apache.org Date: Thursday, September 1, 2011, 9:58 AM From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Is there any indication from what I've said that I need a larger heap? Don't think so, but GC logging will tell you for sure. The compressed OOPs capability with a small heap should not incur any encode/decode overhead and should improve cache hit rates. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 9/1/2011 12:09 PM, Tony Anecito wrote: Two things to think about in addition to the recommendation Chuck mentioned. 1. For small requests having a large new generation in the heap helps. This helps with the short term objects in the heap to get collected. I'm not sure what this means, but I'll do some digging. 2. Lots of small disk IO may slow you down also. Remember http is synchronous and has to wait for disk IO unless it is spawned off to an independent thread. I have write caching turned on in the OS, and went through the synchronizing work in a previous round of performance tuning a couple of years ago. Hope this helps. -Tony --- On Thu, 9/1/11, Caldarale, Charles Rchuck.caldar...@unisys.com wrote: From: Caldarale, Charles Rchuck.caldar...@unisys.com Subject: RE: Performance for many small requests To: Tomcat Users Listusers@tomcat.apache.org Date: Thursday, September 1, 2011, 9:58 AM From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests Is there any indication from what I've said that I need a larger heap? Don't think so, but GC logging will tell you for sure. The compressed OOPs capability with a small heap should not incur any encode/decode overhead and should improve cache hit rates. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests For small requests having a large new generation in the heap helps. This helps with the short term objects in the heap to get collected. I'm not sure what this means, but I'll do some digging. In theory, current HotSpot GC algorithms should adjust the internal heap component sizes automatically, but in truly bizarre situations, one may benefit from manually controlling the boundaries. Approach any such adjustments with much caution. If your GC logging indicates that GC times are not a concern, I'd leave the finer details of heap management up to the JVM. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David, On 9/1/2011 10:40 AM, David kerber wrote: I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests, 150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. If the requests are small and you are making them individually, you might want to either disable HTTP keepalives or have your clients specify Connection: close in their request headers. You could also use the NIO connector which allows you to have fewer threads serve more requests without the keepalive-expiration delay. Would one of the thread pools help this situation? Probably not, but I think thread pools (aka Executors) are a good idea because they can take threads out-of-service when not in use. !-- Executor name=tomcatThreadPool namePrefix=catalina-exec- maxThreads=300 minSpareThreads=4/ -- Connector port=1024 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=600 acceptCount=100 minSpareThreads=10 socketBuffer=16384 / !-- Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=1 redirectPort=8443 / -- Connector port=8009 protocol=AJP/1.3 redirectPort=8443 / Looks like this connector has very little configuration. Is that because you aren't using it? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5fvZcACgkQ9CaO5/Lv0PBoFgCgt/8pP7YiQsfn6QK2hQypuaSQ XsgAn3znWPovPxKRfOmirkaJ1hPAVUG3 =rkks -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 9/1/2011 1:15 PM, Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David, On 9/1/2011 10:40 AM, David kerber wrote: I'm having some somewhat minor performance issues, not performing quite as well as my Win2k machine with TC 5.5. Could somebody look at my server.xml and recommend some tweaks for handling tons of very small requests,150 bytes per request. The requests are sent with a single http post, from ~600 remote sites collecting data every few seconds to minutes. If the requests are small and you are making them individually, you might want to either disable HTTP keepalives or have your clients specify Connection: close in their request headers. You could also use the NIO connector which allows you to have fewer threads serve more requests without the keepalive-expiration delay. Thanks, I'll take a look at this. Would one of the thread pools help this situation? Probably not, but I think thread pools (akaExecutors) are a good idea because they can take threads out-of-service when not in use. !--Executor name=tomcatThreadPool namePrefix=catalina-exec- maxThreads=300 minSpareThreads=4/ -- Connector port=1024 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=600 acceptCount=100 minSpareThreads=10 socketBuffer=16384 / !--Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=1 redirectPort=8443 / -- Connector port=8009 protocol=AJP/1.3 redirectPort=8443 / Looks like this connector has very little configuration. Is that because you aren't using it? That's correct. I didn't touch it because I don't know what to do with it, and didn't know if deleting it would cause a problem. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests You could also use the NIO connector which allows you to have fewer threads serve more requests without the keepalive-expiration delay. Thanks, I'll take a look at this. NIO may incur slightly more overhead due to thread switching. You'll have to measure to see if it's of any benefit. Looks like this connector has very little configuration. Is that because you aren't using it? I didn't touch it because I don't know what to do with it, and didn't know if deleting it would cause a problem. It's perfectly safe to comment it out or remove it. Doing so is unlikely to have any effect on performance. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, On 9/1/2011 2:00 PM, Caldarale, Charles R wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance for many small requests You could also use the NIO connector which allows you to have fewer threads serve more requests without the keepalive-expiration delay. Thanks, I'll take a look at this. NIO may incur slightly more overhead due to thread switching. You'll have to measure to see if it's of any benefit. Yes, but my guess is that it would be better than suffering through keepalive timeouts if the clients stupidly specify Connection: Keep-Alive when they don't actually intend to send multiple requests. Looks like this connector has very little configuration. Is that because you aren't using it? I didn't touch it because I don't know what to do with it, and didn't know if deleting it would cause a problem. It's perfectly safe to comment it out or remove it. Doing so is unlikely to have any effect on performance. It will probably result in the creation of at least one less thread. It's also a simpler configuration, uses (slightly) less memory, and reduces any attack vectors an intruder might want to use. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5f7cwACgkQ9CaO5/Lv0PDyPwCeMKszXxJEetGetzqzrcSbFp4R 1uUAn1fM0ITBMDrgzjdHklbEmthx3cKi =zvGH -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance for many small requests
From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: Performance for many small requests NIO may incur slightly more overhead due to thread switching. You'll have to measure to see if it's of any benefit. Yes, but my guess is that it would be better than suffering through keepalive timeouts if the clients stupidly specify Connection: Keep-Alive when they don't actually intend to send multiple requests. Agreed. I would forcibly disable keep-alive in the Connector by setting maxKeepAliveRequests to 1. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Re: Performance for many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, On 9/1/2011 4:52 PM, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: Performance for many small requests NIO may incur slightly more overhead due to thread switching. You'll have to measure to see if it's of any benefit. Yes, but my guess is that it would be better than suffering through keepalive timeouts if the clients stupidly specify Connection: Keep-Alive when they don't actually intend to send multiple requests. Agreed. I would forcibly disable keep-alive in the Connector by setting maxKeepAliveRequests to 1. There's that, too :) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5f8RwACgkQ9CaO5/Lv0PDnyACgrYfaaYrfukcgkgZHJNju/maT 2kQAoJFqXhy9jC0fqlZR9KAuYwCis9br =TZML -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 31/08/2011 15:11, David kerber wrote: Has there been any comparison testing done in how the latest 7.x version of TC will compare to the latest 6.0.x version, in the case of tons (hundreds per second) of very small, quick-to-process requests? Not that I am aware of. I have a machine that's starting to croak and am moving to a new machine, and need to decide whether to use the latest 6.x, or the latest 7.x version on Windows 2008. The app will of course be identical in either situation, and all it does is take data sent in the upload request, un-obfuscate it, and write it to disk after doing a checksum test. So it's minimal processing, but there are a LOT of them. Thanks for any insight! I'd expect very little difference. I'd use the latest 7.0.x since the release cycle is a lot shorter. If do you find any problems / bottlenecks with 7.0.x raise them here and they should get fixed pretty promptly. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On Wed, Aug 31, 2011 at 16:11, David kerber dcker...@verizon.net wrote: Has there been any comparison testing done in how the latest 7.x version of TC will compare to the latest 6.0.x version, in the case of tons (hundreds per second) of very small, quick-to-process requests? I have a machine that's starting to croak and am moving to a new machine, and need to decide whether to use the latest 6.x, or the latest 7.x version on Windows 2008. The app will of course be identical in either situation, and all it does is take data sent in the upload request, un-obfuscate it, and write it to disk after doing a checksum test. So it's minimal processing, but there are a LOT of them. Thanks for any insight! Well, first things first, ensure keepalive works properly for the connector(s) you use, but I guess you have it covered already, right? Apart from that... -- Francis Galiegue ONE2TEAM Ingénieur système Mob : +33 (0) 683 877 875 Tel : +33 (0) 178 945 552 f...@one2team.com 40 avenue Raymond Poincaré 75116 Paris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 8/31/2011 10:18 AM, Francis GALIEGUE wrote: On Wed, Aug 31, 2011 at 16:11, David kerberdcker...@verizon.net wrote: Has there been any comparison testing done in how the latest 7.x version of TC will compare to the latest 6.0.x version, in the case of tons (hundreds per second) of very small, quick-to-process requests? I have a machine that's starting to croak and am moving to a new machine, and need to decide whether to use the latest 6.x, or the latest 7.x version on Windows 2008. The app will of course be identical in either situation, and all it does is take data sent in the upload request, un-obfuscate it, and write it to disk after doing a checksum test. So it's minimal processing, but there are a LOT of them. Thanks for any insight! Well, first things first, ensure keepalive works properly for the connector(s) you use, but I guess you have it covered already, right? Apart from that... Yeah, I've got the general performance issues worked out long ago, but that was on 5.5.x. I just wanted to be sure there was little to no difference expected between the two current versions before I went ahead. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
On 8/31/2011 10:16 AM, Mark Thomas wrote: On 31/08/2011 15:11, David kerber wrote: Has there been any comparison testing done in how the latest 7.x version of TC will compare to the latest 6.0.x version, in the case of tons (hundreds per second) of very small, quick-to-process requests? Not that I am aware of. I have a machine that's starting to croak and am moving to a new machine, and need to decide whether to use the latest 6.x, or the latest 7.x version on Windows 2008. The app will of course be identical in either situation, and all it does is take data sent in the upload request, un-obfuscate it, and write it to disk after doing a checksum test. So it's minimal processing, but there are a LOT of them. Thanks for any insight! I'd expect very little difference. I'd use the latest 7.0.x since the release cycle is a lot shorter. If do you find any problems / bottlenecks with 7.0.x raise them here and they should get fixed pretty promptly. Mark Thanks; that's pretty much what I expected, but wanted to confirm. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance for many small requests
Hi David, You need to not only look at the container but it's configuration and the jre that is being used. There have been a lot of improvements in all areas for performance. Also, understand the servlet model seems developers have completely forgotten about it and how important it is. Also, I always revaluate my design/implementation every 6 months or so and make changes based on lessons learned. Also, do not be afraid to try something new :] Regards, Tony Anecito Founder/President MyUniPortal http://www.myuniportal.com 2010 JavaOne Dukes Award winner (Yes I am using Tomcat :]) 2010 JavaOne Outstanding Developer Award --- On Wed, 8/31/11, David kerber dcker...@verizon.net wrote: From: David kerber dcker...@verizon.net Subject: Re: Performance for many small requests To: Tomcat Users List users@tomcat.apache.org Date: Wednesday, August 31, 2011, 8:22 AM On 8/31/2011 10:18 AM, Francis GALIEGUE wrote: On Wed, Aug 31, 2011 at 16:11, David kerberdcker...@verizon.net wrote: Has there been any comparison testing done in how the latest 7.x version of TC will compare to the latest 6.0.x version, in the case of tons (hundreds per second) of very small, quick-to-process requests? I have a machine that's starting to croak and am moving to a new machine, and need to decide whether to use the latest 6.x, or the latest 7.x version on Windows 2008. The app will of course be identical in either situation, and all it does is take data sent in the upload request, un-obfuscate it, and write it to disk after doing a checksum test. So it's minimal processing, but there are a LOT of them. Thanks for any insight! Well, first things first, ensure keepalive works properly for the connector(s) you use, but I guess you have it covered already, right? Apart from that... Yeah, I've got the general performance issues worked out long ago, but that was on 5.5.x. I just wanted to be sure there was little to no difference expected between the two current versions before I went ahead. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 All, On 5/12/2009 9:38 AM, Caldarale, Charles R wrote: Might be interesting to modify it to run with more cores, if you have a system available. Here are the results I got on two different systems. Note that I compiled the test code using the 1.5 JVM though it shouldn't matter at all. I ran all these tests with little to no load on the server (I stopped all TC instances to keep people from hitting them and wasting server time :) SYSTEM 1: 32-bit GNU/Linux kernel 2.6.14 - model name : AMD Athlon(tm) XP 1700+ cpu MHz : 1470.260 bogomips: 2945.26 *** Java 1.5/client $ java -version java version 1.5.0_13 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05) Java HotSpot(TM) Client VM (build 1.5.0_13-b05, mixed mode) $ java TestSynch 1 secondary atomic time: 2010; ticks: 48157094 primary atomic time: 1981; ticks: 51842907 primary synchronized time: 40940; ticks: 49988735 secondary synchronized time: 40850; ticks: 50011266 $ java TestSynch 1 secondary atomic time: 2032; ticks: 49652307 primary atomic time: 1997; ticks: 50347694 primary synchronized time: 41086; ticks: 55617866 secondary synchronized time: 40998; ticks: 44382135 *** Java 1.5/server $ java -version -server java version 1.5.0_13 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05) Java HotSpot(TM) Server VM (build 1.5.0_13-b05, mixed mode) $ java -server TestSynch 1 secondary atomic time: 897; ticks: 47771660 primary atomic time: 860; ticks: 52228341 primary synchronized time: 37749; ticks: 49503874 secondary synchronized time: 37644; ticks: 50496127 $ java -server TestSynch 1 primary atomic time: 882; ticks: 55689446 secondary atomic time: 955; ticks: 44310555 primary synchronized time: 39245; ticks: 45526991 secondary synchronized time: 39350; ticks: 54473010 *** Java 1.6/client $ java -version java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing) $ java TestSynch 1 secondary atomic time: 959; ticks: 47824199 primary atomic time: 980; ticks: 52175802 primary synchronized time: 26029; ticks: 56339037 secondary synchronized time: 24232; ticks: 43660964 $ java TestSynch 1 secondary atomic time: 1050; ticks: 47887651 primary atomic time: 1020; ticks: 52112350 secondary synchronized time: 25947; ticks: 42345253 primary synchronized time: 26042; ticks: 57654748 *** Java 1.6/server $ java -server -version java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode, sharing) $ java -server TestSynch 1 secondary atomic time: 973; ticks: 46198801 primary atomic time: 942; ticks: 53801200 secondary synchronized time: 449; ticks: 3906780 primary synchronized time: 2256; ticks: 96093221 $ java -server TestSynch 1 secondary atomic time: 928; ticks: 55025620 primary atomic time: 924; ticks: 44974381 primary synchronized time: 2672; ticks: 44065122 secondary synchronized time: 2568; ticks: 55934879 SYSTEM 2: 32-bit Windows Vista SP1 - Processor: Core 2 Duo Merom T7500 (2.2GHz) C:\Users\chris\Desktopjava -client -version java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing) C:\Users\chris\Desktopjava -client TestSynch 1 primary atomic time: 4034; ticks: 52861711 secondary atomic time: 4034; ticks: 47138290 secondary synchronized time: 21446; ticks: 50758159 primary synchronized time: 21446; ticks: 49241842 C:\Users\chris\Desktopjava -client TestSynch 1 primary atomic time: 4351; ticks: 45396375 secondary atomic time: 4351; ticks: 54603626 secondary synchronized time: 18824; ticks: 50273205 primary synchronized time: 18824; ticks: 49726796 Oddly enough, I don't have the server VM installed, so I can't check that performance right now. Seems that on my two systems, atomics are faster than language-level synchronization, regardless of client versus server, or 1.5 versus 1.6 (though both using -server and 1.6 both give a significant performance boost to the language-level synchronization). I didn't find this code to exhibit high lock contention: there are only two threads at work, though they are doing nothing but acquiring locks (and incrementing integers, which should be trivial). - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkoMXasACgkQ9CaO5/Lv0PCysACeKVMPuHn1HV32zgETXgD8bzFb t5oAniwV24MvuAarjpXUQwbhxweTMJ1P =T9f3 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
On Tue, May 12, 2009 at 6:27 AM, Caldarale, Charles R chuck.caldar...@unisys.com wrote: From: David Kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests Incrementing a counter can't be much of a synchronization bottleneck, and if I switch to an AtomicInteger, it should be even less of one. Actually, it won't. There's a slight performance difference between the two mechanisms, but it's usually in favor of the synchronized increment, not the AtomicInteger, at least on my dual-core AMD 64 system running JDK 6u12 in 64-bit server mode on Vista. The difference is only a few percent, so you should just code it whichever way you find more maintainable. (Test program available on request; it would be interesting to see if the same relationship exists on a modern Intel chip.) Hello, last time I checked (which is a while ago - 2006 and on 1.5) it was not only processor, but also OS dependent and clearly in favor of atomics (but it probably depends on the number of concurrent writers too). If you would share your test code, I would love to test it on some *nixes and darwins I have here; i'd also volunteer to gather and publish results from everyone else :-) regards Leon - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David Kerber [mailto:dcker...@verizon.net] I definitely should hook a profiler to the app so I can be sure of what's taking the time, though. Yes. If you don't measure it, you don't know whether you're fixing the right problem! Also consider connector, then if necessary process and OS limits on the number of concurrent connections. Do you usually have connector threads sat idle, or are they all reading and processing requests most/all of the time? A thread dump will tell you - the last one you posted had at least one thread in the pool waiting for a connection, and you can simply spot which others look similar. The other way to check would be to monitor the depth of your connector's socket's accept queue, but I'm not aware of any way to do this in Windows. At this point, I'm guessing on any remaining bottlenecks. I recall your network is gigabit from the router (I think I've recalled correctly), but also check: - Is the firewall or router overloaded? Highly unlikely if they're properly specced, but I have been in one data centre where the bottleneck turned out to be the routers.* - What's your external connectivity like? Gigabit from the router is irrelevant if you're trying to fit 20 Mbit/s of data down a 10 Mbit/s pipe :-). - Peter * Names elided to protect the innocent, but a manufacturer's claim that a particular spec of router could handle two ISDN primaries turned out to be correct in the USA (23 B-channels per PRI) and wrong in Europe (30 B-channels per PRI). - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Leon Rosenberg wrote: On Tue, May 12, 2009 at 6:27 AM, Caldarale, Charles R chuck.caldar...@unisys.com wrote: From: David Kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests Incrementing a counter can't be much of a synchronization bottleneck, and if I switch to an AtomicInteger, it should be even less of one. Actually, it won't. There's a slight performance difference between the two mechanisms, but it's usually in favor of the synchronized increment, not the AtomicInteger, at least on my dual-core AMD 64 system running JDK 6u12 in 64-bit server mode on Vista. The difference is only a few percent, so you should just code it whichever way you find more maintainable. (Test program available on request; it would be interesting to see if the same relationship exists on a modern Intel chip.) Hello, last time I checked (which is a while ago - 2006 and on 1.5) it was not only processor, but also OS dependent and clearly in favor of atomics (but it probably depends on the number of concurrent writers too). If you would share your test code, I would love to test it on some *nixes and darwins I have here; i'd also volunteer to gather and publish results from everyone else :-) I'll second that request! Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David Kerber [mailto:dcker...@verizon.net] I definitely should hook a profiler to the app so I can be sure of what's taking the time, though. Yes. If you don't measure it, you don't know whether you're fixing the right problem! It was apparent early on that the synchronization was the most limiting bottleneck, and that has been mostly corrected thanks to you guys. Now I'm looking at various possibilities for the secondary bottlenecks. Also consider connector, then if necessary process and OS limits on the number of concurrent connections. Do you usually have connector threads sat idle, or are they all reading and processing requests most/all of the time? A thread dump will tell you - the last one you posted had at least one thread in the pool waiting for a Yes, I usually have several waiting on the socket, either at my InputStream.read() line, or in some tomcat code that Chuck said was waiting for http headers. However, I still have more completely idle (sleeping) threads than I do busy or locked ones at any given time, so the servlet seems to be keeping up pretty well overall. See below, though... connection, and you can simply spot which others look similar. The other way to check would be to monitor the depth of your connector's socket's accept queue, but I'm not aware of any way to do this in Windows. At this point, I'm guessing on any remaining bottlenecks. I recall your network is gigabit from the router (I think I've recalled correctly), but also check: - Is the firewall or router overloaded? Highly unlikely if they're properly specced, but I have been in one data centre where the bottleneck turned out to be the routers.* In my original post, I posted a bunch of numbers about network and other possible bottlenecks, and what it boiled down to was that neither my firewall load, nor total internet connection bandwidth were close to their limits. I do have questions about the number of connections that the OS networking stack can handle, but have not figured out how to check on that. I also need to investigate some possible latency (as opposed to throughput) issues in my network, given the small request size. - What's your external connectivity like? Gigabit from the router is irrelevant if you're trying to fit 20 Mbit/s of data down a 10 Mbit/s pipe :-). The outside world connection is a full T-1, running about 40% - 50% capacity on average. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] In my original post, I posted a bunch of numbers about network and other possible bottlenecks, and what it boiled down to was that neither my firewall load, nor total internet connection bandwidth were close to their limits. Thanks. Apologies for not referring back! I do have questions about the number of connections that the OS networking stack can handle, but have not figured out how to check on that. As a first step: netstat -an somefile.txt How many TCP sockets are there in the result? The outside world connection is a full T-1, running about 40% - 50% capacity on average. Dedicated or contended bandwidth? Can you get the other 50-60% out of it if you try hard from another machine on the same network, or do you never get it in reality? - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David kerber [mailto:dcker...@verizon.net] In my original post, I posted a bunch of numbers about network and other possible bottlenecks, and what it boiled down to was that neither my firewall load, nor total internet connection bandwidth were close to their limits. Thanks. Apologies for not referring back! No problem; that was many posts ago... I do have questions about the number of connections that the OS networking stack can handle, but have not figured out how to check on that. As a first step: netstat -an somefile.txt How many TCP sockets are there in the result? Just over 1000 total, 810 to the port that this application is using. The vast majority are showing a status of TIME_WAIT, a dozen or so in ESTABLISHED and one (I think) in FIN_WAIT_1. The outside world connection is a full T-1, running about 40% - 50% capacity on average. Dedicated or contended bandwidth? Can you get the other 50-60% out of it if you try hard from another machine on the same network, or do you never get it in reality? That's our corporate connection, so it's shared across all users. I can easily run it up to 100% it by doing a large d/l from somewhere (I need to plan my patch Tuesday updates to avoid trouble), so my router and firewall have no trouble handling the full bandwidth. However, those are low numbers of high-throughput connections. This app produces large numbers of connections, each with small amounts of data, so it may scale differently. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Just over 1000 total, 810 to the port that this application is using. Should be fine on Windows. The vast majority are showing a status of TIME_WAIT, a dozen or so in ESTABLISHED and one (I think) in FIN_WAIT_1. Sounds fair enough. The ESTABLISHED ones are active both ways and able to transfer data; the one in FIN_WAIT_1 has been closed at one end but the other end's still open; and the ones in TIME_WAIT are closed but tombstoned so the TCP stack knows to throw away any data that arrives for them. None of those are a surprise. That's our corporate connection, so it's shared across all users. I can easily run it up to 100% it by doing a large d/l from somewhere (I need to plan my patch Tuesday updates to avoid trouble), so my router and firewall have no trouble handling the full bandwidth. Ah, OK. However, those are low numbers of high-throughput connections. This app produces large numbers of connections, each with small amounts of data, so it may scale differently. It may, but I'd be a little surprised - IP is IP, and you have enough concurrency that latency shouldn't be a problem. That said, if a client has multiple data items to send in rapid succession, does it accumulate those and batch them, or does it send each one as a different request? Or does the situation never arise? - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Peter Crowther [mailto:peter.crowt...@melandra.com] Subject: RE: Performance with many small requests That said, if a client has multiple data items to send in rapid succession, does it accumulate those and batch them, or does it send each one as a different request? Or does the situation never arise? Continuing with that thought, are the requests from a single client frequent enough to warrant using keepalives? Building and tearing down the TCP session on each request might be adding noticeable delay, although your analysis of the heap dumps hasn't shown that yet. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David kerber [mailto:dcker...@verizon.net] Just over 1000 total, 810 to the port that this application is using. Should be fine on Windows. That was my gut feeling too, but I'm glad to have it confirmed. The vast majority are showing a status of TIME_WAIT, a dozen or so in ESTABLISHED and one (I think) in FIN_WAIT_1. Sounds fair enough. The ESTABLISHED ones are active both ways and able to transfer data; the one in FIN_WAIT_1 has been closed at one end but the other end's still open; and the ones in TIME_WAIT are closed but tombstoned so the TCP stack knows to throw away any data that arrives for them. None of those are a surprise. That's our corporate connection, so it's shared across all users. I can easily run it up to 100% it by doing a large d/l from somewhere (I need to plan my patch Tuesday updates to avoid trouble), so my router and firewall have no trouble handling the full bandwidth. Ah, OK. However, those are low numbers of high-throughput connections. This app produces large numbers of connections, each with small amounts of data, so it may scale differently. It may, but I'd be a little surprised - IP is IP, and you have enough concurrency that latency shouldn't be a problem. I was wondering about that. I knew total data throughput wasn't a major issue here, but wasn't sure how latency would affect it. That said, if a client has multiple data items to send in rapid succession, does it accumulate those and batch them, or does it send each one as a different request? Or does the situation never arise? A typical client will have 2 to 5 items to send per transaction (they're actually lines from a data logger's data file), and each line is done in a separate POST request. The frequency of transactions varies widely, but typically won't exceed one every 10 or 15 seconds from any given site. As I mentioned earlier, each data line is small, 20 to 50 bytes. We had looked at batching up the transmissions before, and it's still an option. However that adds a bit of complexity to the software on both ends, though the gain would be far fewer individual requests to process. For now, we prefer the simplicity of line-by-line transmission, but if we start running into network limitations we'll probably start batching them up. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: Peter Crowther [mailto:peter.crowt...@melandra.com] Subject: RE: Performance with many small requests That said, if a client has multiple data items to send in rapid succession, does it accumulate those and batch them, or does it send each one as a different request? Or does the situation never arise? Continuing with that thought, are the requests from a single client frequent enough to warrant using keepalives? Building and tearing down the TCP session on each request might be adding noticeable delay, although your analysis of the heap dumps hasn't shown that yet. See the message I just sent. How difficult are keepalives to implement? Our app design is such that we are never supposed to go longer than 5 minutes without at least a status update transmission. Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
[OT] RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] A typical client will have 2 to 5 items to send per transaction (they're actually lines from a data logger's data file), and each line is done in a separate POST request. The frequency of transactions varies widely, but typically won't exceed one every 10 or 15 seconds from any given site. As I mentioned earlier, each data line is small, 20 to 50 bytes. OK, so your top end is about 1 line every 2 seconds. You'll need at least 2 round-trip times (RTT) per line (SYN out, SYN-ACK back, ACK-DATA out, ACK-DATA back, plus the FIN-ACK out), but that's not a high rate. We had looked at batching up the transmissions before, and it's still an option. However that adds a bit of complexity to the software on both ends, though the gain would be far fewer individual requests to process. For now, we prefer the simplicity of line-by-line transmission, but if we start running into network limitations we'll probably start batching them up. I'm interested - and this is now a long way from Tomcat, hence the [OT] mark above. If a set of lines represents one transaction, why would you ever not send it and try to process it atomically? Or is it acceptable to have part-transactions within your system? - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Leon Rosenberg [mailto:rosenberg.l...@googlemail.com] Subject: Re: Performance with many small requests If you would share your test code, I would love to test it on some *nixes and darwins I have here; Here's the code I used to do the synch vs atomic testing. The command line parameter is the number of loops to perform; you'll want to set it to at least 10, and even then run repeated tests - the timings can vary considerably, at least under Vista. (Also being sent directly to the two requesters, in case the list strips the attachment.) Might be interesting to modify it to run with more cores, if you have a system available. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: Leon Rosenberg [mailto:rosenberg.l...@googlemail.com] Subject: Re: Performance with many small requests If you would share your test code, I would love to test it on some *nixes and darwins I have here; Here's the code I used to do the synch vs atomic testing. The command line parameter is the number of loops to perform; you'll want to set it to at least 10, and even then run repeated tests - the timings can vary considerably, at least under Vista. (Also being sent directly to the two requesters, in case the list strips the attachment.) Might be interesting to modify it to run with more cores, if you have a system available. - Chuck My dev machine: WinXP SP3, dual-core 2.8GHz processor, java 1.5.0_12. First, I ran it in Eclipse as supplied, with looplimit = 1000, and got: secondary atomic time: 6890; ticks: 51773402 primary atomic time: 6890; ticks: 48226599 secondary synchronized time: 21281; ticks: 50282172 primary synchronized time: 21281; ticks: 49717829 Then I reversed the order of the tests (just to be sure it didn't matter) and got similar results: secondary synchronized time: 21219; ticks: 49601191 primary synchronized time: 21234; ticks: 50398810 secondary atomic time: 6734; ticks: 52111089 primary atomic time: 6734; ticks: 47888912 Running at a command line (java -cp . TestSynch) gave me rather different results (qualitatively similar, quantitatively rather different): primary synchronized time: 42998; ticks: 59125831 secondary synchronized time: 42998; ticks: 40874170 secondary atomic time: 4953; ticks: 49025722 primary atomic time: 4953; ticks: 50974279 After several tests, the ratio between the synchronized and atomic times varied between about 5 and 9, but atomic was always the lower time. Running two instances simultaneously didn't change the numbers much (as expected from a dual-core machine), but the command window with the focus always ran significantly faster than the one without it, no matter which one was started first. One very surprising result (to me, anyway) was that 4 instances only extended the time numbers slightly (10%) for the synchronized run, and even less for the atomic run. Going to 8 instances made a dramatic increase in the synchronized time, but again only a slight increase in the atomic version. 16 instances was too much for my system; it took a long time to start the last 8 or so, and both the atomic and the synchronized versions took a lot longer. From these tests, it looks like, under windows XP and java 1.5 any way, that atomics are always faster, and also handle increasing concurrency much better than synchronize() blocks do. Now to test on my server!! Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, On 5/12/2009 12:27 AM, Caldarale, Charles R wrote: From: David Kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests Incrementing a counter can't be much of a synchronization bottleneck, and if I switch to an AtomicInteger, it should be even less of one. Actually, it won't. There's a slight performance difference between the two mechanisms, but it's usually in favor of the synchronized increment, not the AtomicInteger, at least on my dual-core AMD 64 system running JDK 6u12 in 64-bit server mode on Vista. The difference is only a few percent, so you should just code it whichever way you find more maintainable. (Test program available on request; it would be interesting to see if the same relationship exists on a modern Intel chip.) High monitor contention or low? I can run your test code on a Core 2 Duo if you want to publish it. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkoJnmYACgkQ9CaO5/Lv0PDvxgCgsJr3YwJRFNh4ibZEQacaIWcN 1QcAnA5rOrqpu3WMqiBhzUZ6si3bI0lX =9sJl -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests From these tests, it looks like, under windows XP and java 1.5 any way, that atomics are always faster Try it under 1.6; Sun made major improvements to synchronization handling between 1.5 and 1.6. When I reran my tests on 1.5 (which I don't use these days), I got numbers similar to yours. 1.6 is much, much faster. Also, what is your CPU type? Intel and AMD may have significant differences, as may 32- vs 64-bit. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests From these tests, it looks like, under windows XP and java 1.5 any way, that atomics are always faster Try it under 1.6; Sun made major improvements to synchronization handling between 1.5 and 1.6. When I reran my tests on 1.5 (which I don't use these days), I got numbers similar to yours. 1.6 is much, much faster. That's good to know; that would be an incentive for me to migrate this app to 1.6 and Tomcat 6. Also, what is your CPU type? Intel and AMD may have significant differences, as may 32- vs 64-bit. AMD 64 x2, running 32-bit windows XP - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests That's good to know; that would be an incentive for me to migrate this app to 1.6 and Tomcat 6. You don't need to move to Tomcat 6 to use a 1.6 JVM; you can use the Tomcat you already have. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests How difficult are keepalives to implement? That would depend on your client. Looks like the Apache http client supports it, but I haven't used it. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
On May 12, 2009, at 13:09, Caldarale, Charles R chuck.caldar...@unisys.com wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests From these tests, it looks like, under windows XP and java 1.5 any way, that atomics are always faster Try it under 1.6; Sun made major improvements to synchronization handling between 1.5 and 1.6. When I reran my tests on 1.5 (which I don't use these days), I got numbers similar to yours. 1.6 is much, much faster. This reminds me of perfomance optimizations that people used to make in their Java code such as converting String objects to byte arrays to do operations on them because everyone knew that it was faster. Then, Sun came along and optimized the String API implementation, causing all those optimizations to then be slower than the straightforward implementatios of string ops. That optimized code also has the added advantage of being confusing to read. I agree with Chuck's assertion that understandability ought to be a more important goal than maximum possible performance. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Christopher Schultz wrote: On May 12, 2009, at 13:09, Caldarale, Charles R chuck.caldar...@unisys.com wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests From these tests, it looks like, under windows XP and java 1.5 any way, that atomics are always faster Try it under 1.6; Sun made major improvements to synchronization handling between 1.5 and 1.6. When I reran my tests on 1.5 (which I don't use these days), I got numbers similar to yours. 1.6 is much, much faster. This reminds me of perfomance optimizations that people used to make in their Java code such as converting String objects to byte arrays to do operations on them because everyone knew that it was faster. Then, Sun came along and optimized the String API implementation, causing all those optimizations to then be slower than the straightforward implementatios of string ops. That optimized code also has the added advantage of being confusing to read. When (what java version) did those string operation optimizations happen? Sun's web page that talks about this (and explicitly says that string buffers are usually faster than direct string operations) doesn't mention a specific java version. I agree with Chuck's assertion that understandability ought to be a more important goal than maximum possible performance. That's going to depend on the application's intended use. Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests When (what java version) did those string operation optimizations happen? Sun's web page that talks about this (and explicitly says that string buffers are usually faster than direct string operations) doesn't mention a specific java version. Don't confuse a StringBuffer (the recommended type) with a byte array (what Chris was talking about). Since a String object is immutable, one should always use a StringBuffer (preferably a StringBuilder, these days) when you are constructing strings in a piecemeal fashion, then convert to String when complete. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [dcker...@verizon.net] My cpu usage for tomcat has gone from bouncing between 0 and 1 in task manager, to a steady 2 since more threads are now actually doing work instead of waiting around for their turn at the code, my disk writes per sec in perfmon have also more than doubled, and the destination log file is growing much faster as well. All excellent news. The fact that you've seen the performance double means that there was, in fact, a bottleneck there. Have you taken a new thread dump to see whether the locks (almost certainly on the log write) are still a problem? If so, you might have to go to a more complex scheme such as multiple log files managed by a pool manager. Don't even try to write the pool manager yourself; they're horribly messy things to get right and shake the race conditions out*. I half-remember Jakarta Commons has one that can be adapted if you get to that stage. Thanks a ton!!! No problem. - Peter * Yes, I did implement one. I still have the scars. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David kerber [dcker...@verizon.net] My cpu usage for tomcat has gone from bouncing between 0 and 1 in task manager, to a steady 2 since more threads are now actually doing work instead of waiting around for their turn at the code, my disk writes per sec in perfmon have also more than doubled, and the destination log file is growing much faster as well. All excellent news. The fact that you've seen the performance double means that there was, in fact, a bottleneck there. Have you taken a new thread dump to see whether the locks (almost certainly on the log write) are still a problem? If so, you might have to go to a more complex scheme such as multiple log files managed by a pool manager. Don't even try to write the pool manager yourself; they're horribly messy things to get right and shake the race conditions out*. I half-remember Jakarta Commons has one that can be adapted if you get to that stage. From what I can tell now, it looks like most of my wait time is on socket reads. In the thread dump I took about 20 minutes ago, I didn't see any waiting on disk writes: The line listed in this one is my inputStream.read(): [2009-05-11 08:20:09] [info] http-1024-Processor8 [2009-05-11 08:20:09] [info] daemon [2009-05-11 08:20:09] [info] prio=6 tid=0x270e83c8 [2009-05-11 08:20:09] [info] nid=0xcd4 [2009-05-11 08:20:09] [info] runnable [2009-05-11 08:20:09] [info] [0x2755f000..0x2755f9e4] [2009-05-11 08:20:09] [info] at java.net.SocketInputStream.socketRead0(Native Method) [2009-05-11 08:20:10] [info] at java.net.SocketInputStream.read(Unknown Source) [2009-05-11 08:20:10] [info] at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:747) [2009-05-11 08:20:10] [info] at org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:777) [2009-05-11 08:20:10] [info] at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:115) [2009-05-11 08:20:10] [info] at org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:712) [2009-05-11 08:20:10] [info] at org.apache.coyote.Request.doRead(Request.java:423) [2009-05-11 08:20:10] [info] at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:283) [2009-05-11 08:20:10] [info] at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:404) [2009-05-11 08:20:10] [info] at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:298) [2009-05-11 08:20:10] [info] at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:192) [2009-05-11 08:20:10] [info] at eddsrv.EddRcvr.processRequest(EddRcvr.java:199) [2009-05-11 08:20:10] [info] at eddsrv.EddRcvr.doPost(EddRcvr.java:94) [2009-05-11 08:20:10] [info] at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) [2009-05-11 08:20:10] [info] at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) [2009-05-11 08:20:10] [info] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) [2009-05-11 08:20:10] [info] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) [2009-05-11 08:20:11] [info] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) [2009-05-11 08:20:11] [info] at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) [2009-05-11 08:20:11] [info] at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:667) [2009-05-11 08:20:11] [info] at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) [2009-05-11 08:20:11] [info] at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) [2009-05-11 08:20:11] [info] at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) [2009-05-11 08:20:11] [info] at java.lang.Thread.run(Unknown Source) [2009-05-11 08:20:11] [info] This one seems to be waiting on something in tomcat itself: [2009-05-11 08:19:49] [info] http-1024-Processor45 [2009-05-11 08:19:49] [info] daemon [2009-05-11 08:19:49] [info] prio=6 tid=0x26fa6f38 [2009-05-11 08:19:49] [info] nid=0x340 [2009-05-11
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests From what I can tell now, it looks like most of my wait time is on socket reads. In the thread dump I took about 20 minutes ago, I didn't see any waiting on disk writes: The line listed in this one is my inputStream.read(): Waiting for the body of the request to show up. This one seems to be waiting on something in tomcat itself: Waiting for the request header to show up. If that's all you're seeing in the thread dump, then it does look like the network is sluggish, as I think you mentioned before. You might try running Wireshark or equivalent to monitor the traffic and see just how long it takes for each segment of the message to be delivered to the server. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Peter, On 5/8/2009 7:26 AM, Peter Crowther wrote: Decrypt: parallel. Send ack: parallel. Increment counters: synced. Write to log file: synced (or you'll have some very odd stuff happening). I'd go further and suggest that you re-factor your design so that your servlet is very simple. Something like this: public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { RequestCounter counter = ...; // get from app scope? Class-level? RequestLogger logger = ...; // same here RequestProcessor processor = ...; // same here counter.count(); processor.processRequest(request, response, false); logger.log(request, response); } Then its up to the RequestCounter to maintain its own synchonization (if necessary) instead of your servlet having to know the semantics of thread-safety, etc. Same with the logger. As someone mentioned, most logging frameworks handle synchronization for you, and most of them can buffer the output to their log files so that you are getting the best performance you can. I highly recommend using a logging framework, or developing something that meets your needs that is self-contained, can accept log entries from multiple concurrent clients (your servlets), and buffers output to the log file to keep performance up. What is it that processRequest actually does? Decryption? Hmm... is it possible for you to save the decryption for later? You could have a service that simply logs the notifications and then have a batch job that later does the decryption and throws-out all the incorrectly-encrypted data. Just another option. Finally... if you are logging all requests, is it necessary to keep a daily and total request count? You can avoid the synchronization of those counters entirely by ... not bothering to count them. Again, retrospective counting is a possibility. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkoIkLoACgkQ9CaO5/Lv0PAingCbBNb5ESoaIlDwoROOFrjmYySZ X94AniMh23cbmU2rodDw5fFISpRwDyhS =fB6Z -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Peter, On 5/8/2009 7:26 AM, Peter Crowther wrote: Decrypt: parallel. Send ack: parallel. Increment counters: synced. Write to log file: synced (or you'll have some very odd stuff happening). I'd go further and suggest that you re-factor your design so that your servlet is very simple. Something like this: public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { RequestCounter counter = ...; // get from app scope? Class-level? RequestLogger logger = ...; // same here RequestProcessor processor = ...; // same here counter.count(); processor.processRequest(request, response, false); logger.log(request, response); } Then its up to the RequestCounter to maintain its own synchonization (if necessary) instead of your servlet having to know the semantics of thread-safety, etc. Same with the logger. As someone mentioned, most logging frameworks handle synchronization for you, and most of them can buffer the output to their log files so that you are getting the best performance you can. I highly recommend using a logging framework, or developing something that meets your needs that is self-contained, can accept log entries from multiple concurrent clients (your servlets), and buffers output to the log file to keep performance up. I've been meaning to look into some more sophisticated logging techniques, and this exercise has given me some good incentive to do so sooner rather than later. However, it doesn't look at the moment like disk writes are a limiting factor in this app's performance. My latest thread dump indicates that the socket read is where most of the waits are at. Because the requests are so small, I imagine that network latency is a far bigger factor than gross throughput is. I definitely should hook a profiler to the app so I can be sure of what's taking the time, though. What is it that processRequest actually does? Decryption? Hmm... is it possible for you to save the decryption for later? You could have a service that simply logs the notifications and then have a batch job that later does the decryption and throws-out all the incorrectly-encrypted data. Just another option. Basically the entire job of this application (servlet) is to accept the POSTs from the clients in the field, decrypt them, do a few sanity checks on the raw data, and dump them into a file on disk (we call it a cache file). There are separate apps that then continuously read the data from the cache file and do all kinds of processing on it, stuff it into a database, and check various values and trends for near-realtime alerting purposes. Moving the decryption to a later step in the process would be possible, but would require rewriting another application, for probably very little net gain. Early on in the design, we considered doing it all in one application, but felt that this method gave us a little more overall reliability, because one piece could go down without affecting the others, and then it could catch up when it came back up. It also allowed us to profile each section separately, making it a little easier to find the bottlenecks. Finally... if you are logging all requests, is it necessary to keep a daily and total request count? You can avoid the synchronization of those counters entirely by ... not bothering to count them. Again, retrospective counting is a possibility. The counting isn't a core requirement of the application; I just put it in a a way to help me monitor its progress during the day, to be sure it hasn't locked up or lost a network connection somewhere along the way. Incrementing a counter can't be much of a synchronization bottleneck, and if I switch to an AtomicInteger, it should be even less of one. Thanks for the comments! D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David Kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests Incrementing a counter can't be much of a synchronization bottleneck, and if I switch to an AtomicInteger, it should be even less of one. Actually, it won't. There's a slight performance difference between the two mechanisms, but it's usually in favor of the synchronized increment, not the AtomicInteger, at least on my dual-core AMD 64 system running JDK 6u12 in 64-bit server mode on Vista. The difference is only a few percent, so you should just code it whichever way you find more maintainable. (Test program available on request; it would be interesting to see if the same relationship exists on a modern Intel chip.) - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Re: Performance with many small requests
Xie Xiaodong wrote: Hello, IMHO, it would be better to use java concurrency package now than to use the old synchronize mechanism. The old mechanism is to low level and error prone. I think you could have a thread pool and some handler pattern to handle the request from your customer. That is a massive over complication for this use case. On 7-May-2009, at 19:05, David Kerber wrote: The synchronized section doesn't do a whole lot, so it doesn't take long to process. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, You should be able to easily write the decryption (if it isn't already) in a multi-threaded manner. send a small acknowledgement response back to the site, Unlikely to have sync issues (but check to be sure) and write the line to the log file. If you are using a logging framework (like log4j) this will handle the necessary sync for you. Otherwise you may have to write it yourself. Incrementing the counters you are using needs to be synchronized. The simplest solution would be to use atomics. Does that sound like something that would need to be synchronized? So, some bits do must most don't. Getting rid of unnecessary syncs is a good thing but you really should find out where the bottleneck is before you start changing code. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David Kerber [mailto:dcker...@verizon.net] The synchronized section doesn't do a whole lot, so it doesn't take long to process. Indeed. So take a thread dump and see what's happening before making *any* changes to this key part. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. Decrypt: parallel. Send ack: parallel. Increment counters: synced. Write to log file: synced (or you'll have some very odd stuff happening). - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Mark Thomas wrote: Xie Xiaodong wrote: Hello, IMHO, it would be better to use java concurrency package now than to use the old synchronize mechanism. The old mechanism is to low level and error prone. I think you could have a thread pool and some handler pattern to handle the request from your customer. That is a massive over complication for this use case. That was my thought as well, but I don't know enough about the subject of concurrency and synchronization to be sure. On 7-May-2009, at 19:05, David Kerber wrote: The synchronized section doesn't do a whole lot, so it doesn't take long to process. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, You should be able to easily write the decryption (if it isn't already) in a multi-threaded manner. To that end, am I correct in understanding that any variables and objects that are declared locally to the method that does the work (such as the decryption routine) are going to be inherently thread safe? And that variables and objects declared at the class level (such as my counters) may not be? That's what the reading I did last night seemed to indicate, without explicitly stating so. send a small acknowledgement response back to the site, Unlikely to have sync issues (but check to be sure) and write the line to the log file. If you are using a logging framework (like log4j) this will handle the necessary sync for you. Otherwise you may have to write it yourself. I'm doing it with the standard text file methods, but all the objects and variables are local to the method that processes the request. Incrementing the counters you are using needs to be synchronized. The simplest solution would be to use atomics. I had never heard of them before I was reading about this yesterday, but it looks like a good possibility. Does that sound like something that would need to be synchronized? So, some bits do must most don't. Getting rid of unnecessary syncs is a good thing but you really should find out where the bottleneck is before you start changing code. That's my goal, but as far as I can tell right now, the bottleneck is narrowed down to either my code, or the customer's network, and testing some fixes to my code is pretty easy. Thanks for the comments! Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David Kerber [mailto:dcker...@verizon.net] The synchronized section doesn't do a whole lot, so it doesn't take long to process. Indeed. So take a thread dump and see what's happening before making *any* changes to this key part. I'm trying; if I use tomcat5w.exe to take a thread dump, where does it leave the file? I can't find it, and it doesn't seem to put it on the clipboard either. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests if I use tomcat5w.exe to take a thread dump, where does it leave the file? If you can take a thread dump with tomct5w.exe, please let us know how, because I'm certainly not aware of it having such a capability. The platform-independent method is to use jps to find the process id of the Tomcat instance, then jstack to create the thread dump. These tools are part of the JDK from 1.5 onwards; documentation is on the java.sun.com web site, but you probably won't need the doc. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests if I use tomcat5w.exe to take a thread dump, where does it leave the file? If you can take a thread dump with tomct5w.exe, please let us know how, because I'm certainly not aware of it having such a capability. If you right-click on the icon in the system try, one of the items says Thread dump. The platform-independent method is to use jps to find the process id of the Tomcat instance, then jstack to create the thread dump. These tools are part of the JDK from 1.5 onwards; documentation is on the java.sun.com web site, but you probably won't need the doc. Ok, I'll have to install the jdk then; I've only got the jre installed on the server. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests If you right-click on the icon in the system try, one of the items says Thread dump. Right - sorry for forgetting that. I never install from the .exe download (too restrictive), so I never have the icon present. I was thinking about the tomcat5w.exe GUI. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests If you right-click on the icon in the system try, one of the items says Thread dump. Right - sorry for forgetting that. I never install from the .exe download (too restrictive), so I never have the icon present. I was thinking about the tomcat5w.exe GUI. I've gotten in the habit of doing a double install: install from the .exe, and then extract the .zip distro on top of it. That said, any idea where that might leave the thread dump? Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David Kerber [mailto:dcker...@verizon.net] The synchronized section doesn't do a whole lot, so it doesn't take long to process. Indeed. So take a thread dump and see what's happening before making *any* changes to this key part. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. Decrypt: parallel. Send ack: parallel. Increment counters: synced. Write to log file: synced (or you'll have some very odd stuff happening). Would a single thread executor service alongside an atomic counter be useful here? (my concurrency knowledge isn't so hot). I'm not sure if a) this is suitable or b) if it would solve the problem, as you may still end up with a delayed write to the log during peaky periods - at least they'd be in the right order though. You could be dumping runnables into it during the post which would return quickly for the next request. You'd have to consider exec.shutdown() exec.shutdownNow() in your servlet destroy to ensure you didn't drop data during a shutdown or app restart p - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Pid wrote: Peter Crowther wrote: From: David Kerber [mailto:dcker...@verizon.net] The synchronized section doesn't do a whole lot, so it doesn't take long to process. Indeed. So take a thread dump and see what's happening before making *any* changes to this key part. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. Decrypt: parallel. Send ack: parallel. Increment counters: synced. Write to log file: synced (or you'll have some very odd stuff happening). Would a single thread executor service alongside an atomic counter be useful here? (my concurrency knowledge isn't so hot). I'm not sure if a) this is suitable or b) if it would solve the problem, as you may still end up with a delayed write to the log during peaky periods - at least they'd be in the right order though. The order is the only thing that's important; a short delay (up to a few tens of seconds) is no problem. Also, right now I'm doing a .flush() after the .write() to the log file. Is that usually necessary, other than to avoid losing data lines in case of a system failure? A few lost lines, while not desirable, isn't too big of a problem in this particular application. How would a .flush() affect the speed of returning from a synchronized .write()? You could be dumping runnables into it during the post which would return quickly for the next request. You'd have to consider You lost me here... exec.shutdown() exec.shutdownNow() in your servlet destroy to ensure you didn't drop data during a shutdown or app restart Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
-Original Message- From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests That said, any idea where that might leave the thread dump? After some experimentation, I found it in jakarta_service_MMDD.log in Tomcat's logs directory. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Pid [mailto:p...@pidster.com] Subject: Re: Performance with many small requests Would a single thread executor service alongside an atomic counter be useful here? (my concurrency knowledge isn't so hot). Sounds like overkill just for ordering. Synchronization with the single thread doing the logging work would still be necessary, so nothing's really gained. You could be dumping runnables into it during the post which would return quickly for the next request. You'd have to consider exec.shutdown() exec.shutdownNow() in your servlet destroy to ensure you didn't drop data during a shutdown or app restart Way, way too much complexity for the problem at hand. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: -Original Message- From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests That said, any idea where that might leave the thread dump? After some experimentation, I found it in jakarta_service_MMDD.log in Tomcat's logs directory. Apparently it doesn't spit out the thread dump if the logging level is set to error, because I had looked there, and looked again just now (in case it took longer than I expected). When I get a chance to restart the service, I'll changed the logging level and try to get a dump. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Also, right now I'm doing a .flush() after the .write() to the log file. Is that usually necessary, other than to avoid losing data lines in case of a system failure? No, other than that. What disk subsystem are you running on? Start Performance Monitor and, from Physical Disks, monitor your disk writes per second. If it's over 150(ish, depending on the disk) per spindle in your disk array, you're saturating your disks. How would a .flush() affect the speed of returning from a synchronized .write()? It can be significant, as the data has to get to the file. I'd check the above. Also, do you have any battery-backed write cache (BBWC) on the disk subsystem and how's it configured? On systems where disk has proved to be the bottleneck, and there are many small pieces of data being written, I've seen better than a factor of 10 improvement by adding write cache in this way. - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests Apparently it doesn't spit out the thread dump if the logging level is set to error, because I had looked there, and looked again just now (in case it took longer than I expected). Mine is set to error and the thread dump appears as expected; any setting of info or better should display it. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David kerber [mailto:dcker...@verizon.net] Also, right now I'm doing a .flush() after the .write() to the log file. Is that usually necessary, other than to avoid losing data lines in case of a system failure? No, other than that. What disk subsystem are you running on? Start Performance Monitor and, from Physical Disks, monitor your disk writes per second. If it's over 150(ish, depending on the disk) per spindle in your disk array, you're saturating your disks. I don't recall the exact disk configuration, but it's pretty robust and on par with the rest of the system, because this server was originally spec'd as a combination file and application server. How would a .flush() affect the speed of returning from a synchronized .write()? It can be significant, as the data has to get to the file. I'd check the above. Also, do you have any battery-backed write cache (BBWC) on the disk subsystem and how's it configured? On systems where disk has proved to be the bottleneck, and there are many small pieces of data being written, I've seen better than a factor of 10 improvement by adding write cache in this way. I'll look into that to be sure, but I don't think the HD is limiting. D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] I'll look into that to be sure, but I don't think the HD is limiting. I think I agree with you, but it's a classic area that people miss - Intel have done entirely too good a job of branding the CPU as the only place where speed matters! - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
David kerber wrote: Caldarale, Charles R wrote: -Original Message- From: David kerber [mailto:dcker...@verizon.net] Subject: Re: Performance with many small requests That said, any idea where that might leave the thread dump? After some experimentation, I found it in jakarta_service_MMDD.log in Tomcat's logs directory. Apparently it doesn't spit out the thread dump if the logging level is set to error, because I had looked there, and looked again just now (in case it took longer than I expected). When I get a chance to restart the service, I'll changed the logging level and try to get a dump. D Now that I've got a thread dump, what am I looking for? I've got a bunch of sections like this, pretty much all of which are waiting to lock 0x057c73e0. Is there any way to figure out what that object is? I imagine it's the disk write, but can't figure out how to tell for sure. [2009-05-08 10:43:24] [info] http-1024-Processor1 [2009-05-08 10:43:24] [info] daemon [2009-05-08 10:43:24] [info] prio=6 tid=0x26d0fe70 [2009-05-08 10:43:24] [info] nid=0x115c [2009-05-08 10:43:24] [info] waiting for monitor entry [2009-05-08 10:43:24] [info] [0x2739f000..0x2739fb64] [2009-05-08 10:43:24] [info] at eddsrv.EddRcvr.doPost(EddRcvr.java:70) [2009-05-08 10:43:24] [info] - waiting to lock 0x057c73e0 (a eddsrv.EddRcvr) [2009-05-08 10:43:24] [info] at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) [2009-05-08 10:43:24] [info] at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) [2009-05-08 10:43:24] [info] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) [2009-05-08 10:43:24] [info] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) [2009-05-08 10:43:24] [info] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) [2009-05-08 10:43:25] [info] at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) [2009-05-08 10:43:25] [info] at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:667) [2009-05-08 10:43:25] [info] at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) [2009-05-08 10:43:25] [info] at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) [2009-05-08 10:43:25] [info] at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) [2009-05-08 10:43:25] [info] at java.lang.Thread.run(Unknown Source) - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] Now that I've got a thread dump, what am I looking for? You found it first time :-). Now the hard part - fixing it. I've got a bunch of sections like this, pretty much all of which are waiting to lock 0x057c73e0. Is there any way to figure out what that object is? I imagine it's the disk write, but can't figure out how to tell for sure. It's the sync at the start of your method. [2009-05-08 10:43:24] [info] waiting for monitor entry [2009-05-08 10:43:24] [info] [0x2739f000..0x2739fb64] [2009-05-08 10:43:24] [info] at eddsrv.EddRcvr.doPost(EddRcvr.java:70) [2009-05-08 10:43:24] [info] - waiting to lock 0x057c73e0 (a eddsrv.EddRcvr) ... so they're all waiting to get the monitor on a eddsrv.EddRcvr, which is what the synchronized on your doPost method will lock on. If you say pretty much all are stuck there, then you have massive contention on that monitor. Time to move to some finer-grained locking! As a first step, I'd remove the synchronized from the method; I'd replace it with one lock around the counter updates (locked on one object) and another lock in your decrypt/log/respond code that's purely around the logging section (locked on a different object). Then I'd re-evaluate - run, take another thread dump and see where the bottlenecks are now. If they're anywhere, I'll bet they're around the logging code. - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: David kerber [mailto:dcker...@verizon.net] Now that I've got a thread dump, what am I looking for? You found it first time :-). Now the hard part - fixing it. Yeah, that's what I figured! I've got a bunch of sections like this, pretty much all of which are waiting to lock 0x057c73e0. Is there any way to figure out what that object is? I imagine it's the disk write, but can't figure out how to tell for sure. It's the sync at the start of your method. [2009-05-08 10:43:24] [info] waiting for monitor entry [2009-05-08 10:43:24] [info] [0x2739f000..0x2739fb64] [2009-05-08 10:43:24] [info] at eddsrv.EddRcvr.doPost(EddRcvr.java:70) [2009-05-08 10:43:24] [info] - waiting to lock 0x057c73e0 (a eddsrv.EddRcvr) I also have quite a few blocks like this: [2009-05-08 10:43:23] [info] http-1024-Processor10 [2009-05-08 10:43:23] [info] daemon [2009-05-08 10:43:23] [info] prio=6 tid=0x271f1418 [2009-05-08 10:43:23] [info] nid=0xa74 [2009-05-08 10:43:23] [info] in Object.wait() [2009-05-08 10:43:23] [info] [0x275df000..0x275dfae4] [2009-05-08 10:43:23] [info] at java.lang.Object.wait(Native Method) [2009-05-08 10:43:23] [info] at java.lang.Object.wait(Unknown Source) [2009-05-08 10:43:23] [info] at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:656) [2009-05-08 10:43:23] [info] - locked 0x0510e6e0 (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable) [2009-05-08 10:43:23] [info] at java.lang.Thread.run(Unknown Source) [2009-05-08 10:43:23] [info] I assume these are just threads waiting for something to do (waiting for a request)? ... so they're all waiting to get the monitor on a eddsrv.EddRcvr, which is what the synchronized on your doPost method will lock on. Until you said that, I didn't even notice that I had what appear to be double synchronizations, making the method synchronized, and also having synchronized{} blocks inside it. I assume I've been double-screwing myself all this time?? protected synchronized void doPost(HttpServletRequest request, HttpServletResponse response ) throws ServletException, IOException { synchronized ( criticalProcess ) { totalReqCount++; dailyReqCount++; processRequest( request, response, false ); } } If you say pretty much all are stuck there, then you have massive contention on that monitor. Time to move to some finer-grained locking! As a first step, I'd remove the synchronized from the method; I'd replace it with one lock around the counter updates (locked on one object) and another lock in your decrypt/log/respond code that's purely around the logging section (locked on a different object). Then I'd re-evaluate - run, take another thread dump and see where the bottlenecks are now. If they're anywhere, I'll bet they're around the logging code. Thanks! D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] I also have quite a few blocks like this: [...] [2009-05-08 10:43:23] [info] - locked 0x0510e6e0 (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable) [...] I assume these are just threads waiting for something to do (waiting for a request)? They look like spares in the pool, but my knowledge of Tomcat's internals is limited. Until you said that, I didn't even notice that I had what appear to be double synchronizations, making the method synchronized, and also having synchronized{} blocks inside it. I assume I've been double-screwing myself all this time?? Yeah, I did raise an eyebrow when I saw it. It'll take a few CPU cycles per request, but no more than that. Only one thread can get into the method, so any internal syncs just add overhead. They're not further places for contention to occur. - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Peter Crowther [mailto:peter.crowt...@melandra.com] Subject: RE: Performance with many small requests They look like spares in the pool, but my knowledge of Tomcat's internals is limited. Yes, they are just waiting for requests to show up. Only one thread can get into the method Strictly speaking, that's one thread per *servlet* object; if using the SingleThreadModel (let's hope not), the container is allowed to create multiple instances. They're not further places for contention to occur. Depending on what else uses the criticalProcess object, that may or may not be true. Regardless, synchronizing on the method is very likely a complete waste of time. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Caldarale, Charles R wrote: From: Peter Crowther [mailto:peter.crowt...@melandra.com] Subject: RE: Performance with many small requests They look like spares in the pool, but my knowledge of Tomcat's internals is limited. Yes, they are just waiting for requests to show up. Only one thread can get into the method Strictly speaking, that's one thread per *servlet* object; if using the SingleThreadModel (let's hope not), the container is allowed to create multiple instances. They're not further places for contention to occur. Depending on what else uses the criticalProcess object, that may or may not be true. Regardless, synchronizing on the method is very likely a complete waste of time. Thanks for confirming that. Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Caldarale, Charles R [mailto:chuck.caldar...@unisys.com] Strictly speaking, that's one thread per *servlet* object; if using the SingleThreadModel (let's hope not), the container is allowed to create multiple instances. Good point in the general case, but I rather suspect David would have seen very different performance characteristics and some *very* confused output if that were the case here. They're not further places for contention to occur. Depending on what else uses the criticalProcess object, that may or may not be true. Another good point. I was assuming something that isn't necessarily true, namely that criticalProcess was created for just that sync block. Meh, why don't I bow out and leave Chuck to give all the good answers? ;-) - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: Peter Crowther [mailto:peter.crowt...@melandra.com] Subject: RE: Performance with many small requests Meh, why don't I bow out and leave Chuck to give all the good answers? A) I don't have them all. B) What I do have is meetings, bloody meetings and won't be answering promptly. :-( - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Peter Crowther wrote: From: Caldarale, Charles R [mailto:chuck.caldar...@unisys.com] Strictly speaking, that's one thread per *servlet* object; if using the SingleThreadModel (let's hope not), the container is allowed to create multiple instances. Good point in the general case, but I rather suspect David would have seen very different performance characteristics and some *very* confused output if that were the case here. They're not further places for contention to occur. Depending on what else uses the criticalProcess object, that may or may not be true. Another good point. I was assuming something that isn't necessarily true, namely that criticalProcess was created for just that sync block. Which, of course, is a correct assumption. Meh, why don't I bow out and leave Chuck to give all the good answers? ;-) You've done great so far: I implemented your suggestions, and it looks like it has more than doubled my throughput! My cpu usage for tomcat has gone from bouncing between 0 and 1 in task manager, to a steady 2 since more threads are now actually doing work instead of waiting around for their turn at the code, my disk writes per sec in perfmon have also more than doubled, and the destination log file is growing much faster as well. Thanks a ton!!! Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Performance with many small requests
From: David kerber [mailto:dcker...@verizon.net] The tomcat application simply takes the post request, does a checksum verification of it, decrypts the lightly-encrypted data, and writes it to a log file with the timestamps and site identifiers I mentioned above. Pretty simple processing, and it is all inside a synchronized{} construct: protected synchronized void doPost(HttpServletRequest request, HttpServletResponse response ) throws ServletException, IOException { synchronized ( criticalProcess ) { totalReqCount++; dailyReqCount++; processRequest( request, response, false ); } } Doesn't the synchronized in the above mean that you're essentially single-threading Tomcat? So you have all this infrastructure... and that sync may well be the bottleneck. You could detect this by taking a thread dump in the middle of the day, and seeing whether a significant number of threads were waiting on either of your sync objects. If there are a significant number, consider re-engineering this critical piece of your application to be multi-threaded :-). - Peter - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Andre-John Mas wrote: On 7-May-2009, at 17:28, Peter Crowther wrote: From: David kerber [mailto:dcker...@verizon.net] The tomcat application simply takes the post request, does a checksum verification of it, decrypts the lightly-encrypted data, and writes it to a log file with the timestamps and site identifiers I mentioned above. Pretty simple processing, and it is all inside a synchronized{} construct: protected synchronized void doPost(HttpServletRequest request, HttpServletResponse response ) throws ServletException, IOException { synchronized ( criticalProcess ) { totalReqCount++; dailyReqCount++; processRequest( request, response, false ); } } Doesn't the synchronized in the above mean that you're essentially single-threading Tomcat? So you have all this infrastructure... and that sync may well be the bottleneck. That would be my impression too. It is best to avoid making the synchronized scope so large, unless there is a very good reason. David, do you have any reason for this? Beyond the counter, what other stuff do you synchronise? Also, it has generally been recommended to me to avoid hitting the disk in every request, since you may result with an I/O bottle neck, so if you can write the logs in batches you will have better performance. If you know that you are only going to have very few users at a time (say, less than 10), it may not be worth the time optimising this, but if you know that you are going to get at least several hundred, then this is something to watch out for. Thanks for the comments, Andre-John and Peter. When I wrote that app, I didn't know as much as I do now, but I'm still not very knowledgeable about synchronized operations. The synchronized section doesn't do a whole lot, so it doesn't take long to process. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. Thanks! D - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
On 7-May-2009, at 19:05, David Kerber wrote: Andre-John Mas wrote: That would be my impression too. It is best to avoid making the synchronized scope so large, unless there is a very good reason. David, do you have any reason for this? Beyond the counter, what other stuff do you synchronise? Also, it has generally been recommended to me to avoid hitting the disk in every request, since you may result with an I/O bottle neck, so if you can write the logs in batches you will have better performance. If you know that you are only going to have very few users at a time (say, less than 10), it may not be worth the time optimising this, but if you know that you are going to get at least several hundred, then this is something to watch out for. Thanks for the comments, Andre-John and Peter. When I wrote that app, I didn't know as much as I do now, but I'm still not very knowledgeable about synchronized operations. The synchronized section doesn't do a whole lot, so it doesn't take long to process. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. I am no expert in this myself, but I know enough to help me out in most day to day scenarios. What you should be reading up on is concurrency in Java. A few useful resources: site: http://java.sun.com/docs/books/tutorial/essential/concurrency/ book: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601 I actually bought the book myself and find it a handy reference. What I can say is that any time two threads are likely to access the same object, which has the potential to be modified by one of them, then you will need to synchronize access to the object. If the object is only going to be read during the life of the unit of work, then you will need not synchronize it. You shouldn't simply use the synchronize keyword as a magical solve all for threading issues and instead need to understand what the nature of the interactions are between the threads, if any. In certain cases it is actually better to duplicate the necessary resources, have each thread work on its copy and then synchronize the value at the end. In the case of your code, you should ask what are the shared objects that are going to modified by the threads. You should also look if it is even necessary for the objects to be shared. Also consider whether for the call cycle the objects you are going to modify are only available on the stack, as opposed to a class or instance member. To give you a real world analogy: consider a home that is being built and you have an electrician and a plumber: - is it better to have one wait until the other is finished (serial execution)? - is it possible for them to be working on different stuff and not be stepping on each other's feet? (parallel execution) - if you need them to work at the same time, what is the cost of coordinating each other so that they do not interfere with the other? (synchronization issues) In many ways multi-threading is not much different, and you should be asking yourself the same type of questions. André-John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Performance with many small requests
Hello, IMHO, it would be better to use java concurrency package now than to use the old synchronize mechanism. The old mechanism is to low level and error prone. I think you could have a thread pool and some handler pattern to handle the request from your customer. 2009/5/8 Andre-John Mas aj...@sympatico.ca On 7-May-2009, at 19:05, David Kerber wrote: Andre-John Mas wrote: That would be my impression too. It is best to avoid making the synchronized scope so large, unless there is a very good reason. David, do you have any reason for this? Beyond the counter, what other stuff do you synchronise? Also, it has generally been recommended to me to avoid hitting the disk in every request, since you may result with an I/O bottle neck, so if you can write the logs in batches you will have better performance. If you know that you are only going to have very few users at a time (say, less than 10), it may not be worth the time optimising this, but if you know that you are going to get at least several hundred, then this is something to watch out for. Thanks for the comments, Andre-John and Peter. When I wrote that app, I didn't know as much as I do now, but I'm still not very knowledgeable about synchronized operations. The synchronized section doesn't do a whole lot, so it doesn't take long to process. My question is, what kinds of operations need to be synchronized? All I do is decrypt the data from the POST, send a small acknowledgement response back to the site, and write the line to the log file. Does that sound like something that would need to be synchronized? If not, pulling that out would be a really easy test to see if it helps my performance issue. I am no expert in this myself, but I know enough to help me out in most day to day scenarios. What you should be reading up on is concurrency in Java. A few useful resources: site: http://java.sun.com/docs/books/tutorial/essential/concurrency/ book: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601 I actually bought the book myself and find it a handy reference. What I can say is that any time two threads are likely to access the same object, which has the potential to be modified by one of them, then you will need to synchronize access to the object. If the object is only going to be read during the life of the unit of work, then you will need not synchronize it. You shouldn't simply use the synchronize keyword as a magical solve all for threading issues and instead need to understand what the nature of the interactions are between the threads, if any. In certain cases it is actually better to duplicate the necessary resources, have each thread work on its copy and then synchronize the value at the end. In the case of your code, you should ask what are the shared objects that are going to modified by the threads. You should also look if it is even necessary for the objects to be shared. Also consider whether for the call cycle the objects you are going to modify are only available on the stack, as opposed to a class or instance member. To give you a real world analogy: consider a home that is being built and you have an electrician and a plumber: - is it better to have one wait until the other is finished (serial execution)? - is it possible for them to be working on different stuff and not be stepping on each other's feet? (parallel execution) - if you need them to work at the same time, what is the cost of coordinating each other so that they do not interfere with the other? (synchronization issues) In many ways multi-threading is not much different, and you should be asking yourself the same type of questions. André-John - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org -- Sincerely yours and Best Regards, Xie Xiaodong