Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Ayub, On 11/12/20 11:20, Ayub Khan wrote: Chris, That's correct, it's just a plain static hello world page I created to verify tomcat. It is served by tomcat. I have bundled this page in the same context where the service is running. When I create load on the service and then try to access the static hello world page browser keeps busy and does not return the page. I checked the database dashboard and the monitoring charts are normal, no spikes on cpu or any other resources of the database. The delay is noticeable when there are more than 1000 concurrent requests from each of 4 different JMeter test instances That's 4000 concurrent requests. Your only has 2000 threads, so only 2000 requests can be processed simultaneously. You have a keepalive timeout of 6 seconds (6000ms) and I'm guessing your load test doesn't actually use KeepAlive. Why does tomcat not even serve the html page I think the keepalive timeout explains what you are seeing. Are you instructing JMeter to re-use connections and also use KeepAlive? What happens if you set the KeepAlive timeout to 1 second instead of 6? Does that improve things? -chris On Thu, Nov 12, 2020 at 7:01 PM Christopher Schultz < ch...@christopherschultz.net> wrote: Ayub, On 11/12/20 10:47, Ayub Khan wrote: Chris, I am using hikaricp connection pooling and the maximum pool size is set to 100, without specifying minimum idle connections. Even during high load I see there are more than 80 connections in idle state. I have setup debug statements to print the total time taken to complete the request. The response time of completed call during load is around 5 seconds, the response time without load is around 400 to 500 milliseconds That's a significant difference. Is your database server showing high CPU usage or more I/O usage during those high-load times? During the load I cannot even access static html page Now *that* is an interesting data point. You are sure that the "static" request doesn't hit any other resources? No filter is doing anything? No logging to an external service or double-checking any security constraints in the db before serving the page? (And the static page is being returned by Tomcat, not nginx, right?) -chris On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz < ch...@christopherschultz.net> wrote: Ayub, On 11/11/20 16:16, Ayub Khan wrote: I was load testing using the ec2 load balancer dns. I have increased the connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am not seeing connection timeout in nginx logs now. No errors in kernel.log I am not seeing any errors in tomcat catalina.out. The timeouts are most likely related to the connection timeout (and therefore keepalive) setting. If you are proxying connections from nginx and they should be staying open, you should really never be experiencing a timeout between nginx and Tomcat. During regular operations when the request count is between 4 to 6k requests per minute the open files count for the tomcat process is between 200 to 350. Responses from tomcat are within 5 seconds. Good. If the requests count goes beyond 6.5 k open files slowly move up to 2300 to 3000 and the request responses from tomcat become slow. This is pretty important, here. You are measuring two things: 1. Rise in file descriptor count 2. Application slowness You are assuming that #1 is causing #2. It's entirely possible that #2 is causing #1. The real question is "why is the application slowing down". Do you see CPU spikes? If not, check your db connections. If your db connection pool is fully-utilized (no more available), then you may have lots of request processing threads sitting there waiting on db connections. You'd see a rise in incoming connections (waiting) which aren't making any progress, and the application seems to "slow down", and there is a snowball effect where more requests means more waiting, and therefore more slowness. This would manifest as sloe response times without any CPU spike. You could also have a slow database and/or some other resource such as a downstream web service. I would investigate those options before trying to prove that fds don't scale on JVM or Linux (because they likely DO scale quite well). I am not concerned about high open files as I do not see any errors related to open files. Only side effect of open files going above 700 is the response from tomcat is slow. I checked if this is caused from elastic search, aws cloud watch shows elastic search response is within 5 milliseconds. what might be the reason that when the open files goes beyond 600, it slows down the response time for tomcat. I tried with tomcat 9 and it's the same behavior You might want to add some debug logging to your application when getting ready to contact e.g. a database or remote service. Something like: [timestamp] [thread-id] DEBUG Making call to X [timestamp] [thread-id] DEBUG Completed call to X or
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Chris, That's correct, it's just a plain static hello world page I created to verify tomcat. It is served by tomcat. I have bundled this page in the same context where the service is running. When I create load on the service and then try to access the static hello world page browser keeps busy and does not return the page. I checked the database dashboard and the monitoring charts are normal, no spikes on cpu or any other resources of the database. The delay is noticeable when there are more than 1000 concurrent requests from each of 4 different JMeter test instances Why does tomcat not even serve the html page On Thu, Nov 12, 2020 at 7:01 PM Christopher Schultz < ch...@christopherschultz.net> wrote: > Ayub, > > On 11/12/20 10:47, Ayub Khan wrote: > > Chris, > > > > I am using hikaricp connection pooling and the maximum pool size is set > to > > 100, without specifying minimum idle connections. Even during high load I > > see there are more than 80 connections in idle state. > > > > I have setup debug statements to print the total time taken to complete > the > > request. The response time of completed call during load is around 5 > > seconds, the response time without load is around 400 to 500 milliseconds > > That's a significant difference. Is your database server showing high > CPU usage or more I/O usage during those high-load times? > > > During the load I cannot even access static html page > > Now *that* is an interesting data point. > > You are sure that the "static" request doesn't hit any other resources? > No filter is doing anything? No logging to an external service or > double-checking any security constraints in the db before serving the page? > > (And the static page is being returned by Tomcat, not nginx, right?) > > -chris > > > On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz < > > ch...@christopherschultz.net> wrote: > > > >> Ayub, > >> > >> On 11/11/20 16:16, Ayub Khan wrote: > >>> I was load testing using the ec2 load balancer dns. I have increased > the > >>> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I > am > >>> not seeing connection timeout in nginx logs now. No errors in > kernel.log > >> I > >>> am not seeing any errors in tomcat catalina.out. > >> > >> The timeouts are most likely related to the connection timeout (and > >> therefore keepalive) setting. If you are proxying connections from nginx > >> and they should be staying open, you should really never be experiencing > >> a timeout between nginx and Tomcat. > >> > >>> During regular operations when the request count is between 4 to 6k > >>> requests per minute the open files count for the tomcat process is > >> between > >>> 200 to 350. Responses from tomcat are within 5 seconds. > >> > >> Good. > >> > >>> If the requests count goes beyond 6.5 k open files slowly move up to > >> 2300 > >>> to 3000 and the request responses from tomcat become slow. > >> > >> This is pretty important, here. You are measuring two things: > >> > >> 1. Rise in file descriptor count > >> 2. Application slowness > >> > >> You are assuming that #1 is causing #2. It's entirely possible that #2 > >> is causing #1. > >> > >> The real question is "why is the application slowing down". Do you see > >> CPU spikes? If not, check your db connections. > >> > >> If your db connection pool is fully-utilized (no more available), then > >> you may have lots of request processing threads sitting there waiting on > >> db connections. You'd see a rise in incoming connections (waiting) which > >> aren't making any progress, and the application seems to "slow down", > >> and there is a snowball effect where more requests means more waiting, > >> and therefore more slowness. This would manifest as sloe response times > >> without any CPU spike. > >> > >> You could also have a slow database and/or some other resource such as a > >> downstream web service. > >> > >> I would investigate those options before trying to prove that fds don't > >> scale on JVM or Linux (because they likely DO scale quite well). > >> > >>> I am not concerned about high open files as I do not see any errors > >> related > >>> to open files. Only side effect of open files going above 700 is the > >>> response from tomcat is slow. I checked if this is caused from elastic > >>> search, aws cloud watch shows elastic search response is within 5 > >>> milliseconds. > >>> > >>> what might be the reason that when the open files goes beyond 600, it > >> slows > >>> down the response time for tomcat. I tried with tomcat 9 and it's the > >> same > >>> behavior > >> > >> You might want to add some debug logging to your application when > >> getting ready to contact e.g. a database or remote service. Something > like: > >> > >> [timestamp] [thread-id] DEBUG Making call to X > >> [timestamp] [thread-id] DEBUG Completed call to X > >> > >> or > >> > >> [timestamp] [thread-id] DEBUG Call to X took [duration]ms > >> > >> Then have a look at all those logs when the appli
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Ayub, On 11/12/20 10:47, Ayub Khan wrote: Chris, I am using hikaricp connection pooling and the maximum pool size is set to 100, without specifying minimum idle connections. Even during high load I see there are more than 80 connections in idle state. I have setup debug statements to print the total time taken to complete the request. The response time of completed call during load is around 5 seconds, the response time without load is around 400 to 500 milliseconds That's a significant difference. Is your database server showing high CPU usage or more I/O usage during those high-load times? During the load I cannot even access static html page Now *that* is an interesting data point. You are sure that the "static" request doesn't hit any other resources? No filter is doing anything? No logging to an external service or double-checking any security constraints in the db before serving the page? (And the static page is being returned by Tomcat, not nginx, right?) -chris On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz < ch...@christopherschultz.net> wrote: Ayub, On 11/11/20 16:16, Ayub Khan wrote: I was load testing using the ec2 load balancer dns. I have increased the connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am not seeing connection timeout in nginx logs now. No errors in kernel.log I am not seeing any errors in tomcat catalina.out. The timeouts are most likely related to the connection timeout (and therefore keepalive) setting. If you are proxying connections from nginx and they should be staying open, you should really never be experiencing a timeout between nginx and Tomcat. During regular operations when the request count is between 4 to 6k requests per minute the open files count for the tomcat process is between 200 to 350. Responses from tomcat are within 5 seconds. Good. If the requests count goes beyond 6.5 k open files slowly move up to 2300 to 3000 and the request responses from tomcat become slow. This is pretty important, here. You are measuring two things: 1. Rise in file descriptor count 2. Application slowness You are assuming that #1 is causing #2. It's entirely possible that #2 is causing #1. The real question is "why is the application slowing down". Do you see CPU spikes? If not, check your db connections. If your db connection pool is fully-utilized (no more available), then you may have lots of request processing threads sitting there waiting on db connections. You'd see a rise in incoming connections (waiting) which aren't making any progress, and the application seems to "slow down", and there is a snowball effect where more requests means more waiting, and therefore more slowness. This would manifest as sloe response times without any CPU spike. You could also have a slow database and/or some other resource such as a downstream web service. I would investigate those options before trying to prove that fds don't scale on JVM or Linux (because they likely DO scale quite well). I am not concerned about high open files as I do not see any errors related to open files. Only side effect of open files going above 700 is the response from tomcat is slow. I checked if this is caused from elastic search, aws cloud watch shows elastic search response is within 5 milliseconds. what might be the reason that when the open files goes beyond 600, it slows down the response time for tomcat. I tried with tomcat 9 and it's the same behavior You might want to add some debug logging to your application when getting ready to contact e.g. a database or remote service. Something like: [timestamp] [thread-id] DEBUG Making call to X [timestamp] [thread-id] DEBUG Completed call to X or [timestamp] [thread-id] DEBUG Call to X took [duration]ms Then have a look at all those logs when the applications slows down and see if you can observe a significant jump in the time-to-complete those operations. Hope that helps, -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Chris, I am using hikaricp connection pooling and the maximum pool size is set to 100, without specifying minimum idle connections. Even during high load I see there are more than 80 connections in idle state. I have setup debug statements to print the total time taken to complete the request. The response time of completed call during load is around 5 seconds, the response time without load is around 400 to 500 milliseconds During the load I cannot even access static html page Using Jmeter, I executed 1500 requests to AWS elastic load balancer which had only one VM instance of ninx--> tomcat on the same VM and tomcat consumed total memory of 30Gig and CPU was at 28% t On Thu, Nov 12, 2020 at 6:47 PM Ayub Khan wrote: > Chris, > > I am using hikaricp connection pooling and the maximum pool size is set to > 100, without specifying minimum idle connections. Even during high load I > see there are more than 80 connections in idle state. > > I have setup debug statements to print the total time taken to complete > the request. The response time of completed call during load is around 5 > seconds, the response time without load is around 400 to 500 milliseconds > > During the load I cannot even access static html page > > > > > > > On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz < > ch...@christopherschultz.net> wrote: > >> Ayub, >> >> On 11/11/20 16:16, Ayub Khan wrote: >> > I was load testing using the ec2 load balancer dns. I have increased the >> > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am >> > not seeing connection timeout in nginx logs now. No errors in >> kernel.log I >> > am not seeing any errors in tomcat catalina.out. >> >> The timeouts are most likely related to the connection timeout (and >> therefore keepalive) setting. If you are proxying connections from nginx >> and they should be staying open, you should really never be experiencing >> a timeout between nginx and Tomcat. >> >> > During regular operations when the request count is between 4 to 6k >> > requests per minute the open files count for the tomcat process is >> between >> > 200 to 350. Responses from tomcat are within 5 seconds. >> >> Good. >> >> > If the requests count goes beyond 6.5 k open files slowly move up to >> 2300 >> > to 3000 and the request responses from tomcat become slow. >> >> This is pretty important, here. You are measuring two things: >> >> 1. Rise in file descriptor count >> 2. Application slowness >> >> You are assuming that #1 is causing #2. It's entirely possible that #2 >> is causing #1. >> >> The real question is "why is the application slowing down". Do you see >> CPU spikes? If not, check your db connections. >> >> If your db connection pool is fully-utilized (no more available), then >> you may have lots of request processing threads sitting there waiting on >> db connections. You'd see a rise in incoming connections (waiting) which >> aren't making any progress, and the application seems to "slow down", >> and there is a snowball effect where more requests means more waiting, >> and therefore more slowness. This would manifest as sloe response times >> without any CPU spike. >> >> You could also have a slow database and/or some other resource such as a >> downstream web service. >> >> I would investigate those options before trying to prove that fds don't >> scale on JVM or Linux (because they likely DO scale quite well). >> >> > I am not concerned about high open files as I do not see any errors >> related >> > to open files. Only side effect of open files going above 700 is the >> > response from tomcat is slow. I checked if this is caused from elastic >> > search, aws cloud watch shows elastic search response is within 5 >> > milliseconds. >> > >> > what might be the reason that when the open files goes beyond 600, it >> slows >> > down the response time for tomcat. I tried with tomcat 9 and it's the >> same >> > behavior >> >> You might want to add some debug logging to your application when >> getting ready to contact e.g. a database or remote service. Something >> like: >> >> [timestamp] [thread-id] DEBUG Making call to X >> [timestamp] [thread-id] DEBUG Completed call to X >> >> or >> >> [timestamp] [thread-id] DEBUG Call to X took [duration]ms >> >> Then have a look at all those logs when the applications slows down and >> see if you can observe a significant jump in the time-to-complete those >> operations. >> >> Hope that helps, >> -chris >> >> - >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> > > -- > > Sun Certified Enterprise Architect 1.5 > Sun Certified Java Programmer 1.4 > Microsoft Certified Systems Engineer 2000 > http://in.linkedin.com/pub/ayub-khan/a/811/b81 > mobile:+966-502674604 > --
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Chris, I am using hikaricp connection pooling and the maximum pool size is set to 100, without specifying minimum idle connections. Even during high load I see there are more than 80 connections in idle state. I have setup debug statements to print the total time taken to complete the request. The response time of completed call during load is around 5 seconds, the response time without load is around 400 to 500 milliseconds During the load I cannot even access static html page On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz < ch...@christopherschultz.net> wrote: > Ayub, > > On 11/11/20 16:16, Ayub Khan wrote: > > I was load testing using the ec2 load balancer dns. I have increased the > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am > > not seeing connection timeout in nginx logs now. No errors in kernel.log > I > > am not seeing any errors in tomcat catalina.out. > > The timeouts are most likely related to the connection timeout (and > therefore keepalive) setting. If you are proxying connections from nginx > and they should be staying open, you should really never be experiencing > a timeout between nginx and Tomcat. > > > During regular operations when the request count is between 4 to 6k > > requests per minute the open files count for the tomcat process is > between > > 200 to 350. Responses from tomcat are within 5 seconds. > > Good. > > > If the requests count goes beyond 6.5 k open files slowly move up to > 2300 > > to 3000 and the request responses from tomcat become slow. > > This is pretty important, here. You are measuring two things: > > 1. Rise in file descriptor count > 2. Application slowness > > You are assuming that #1 is causing #2. It's entirely possible that #2 > is causing #1. > > The real question is "why is the application slowing down". Do you see > CPU spikes? If not, check your db connections. > > If your db connection pool is fully-utilized (no more available), then > you may have lots of request processing threads sitting there waiting on > db connections. You'd see a rise in incoming connections (waiting) which > aren't making any progress, and the application seems to "slow down", > and there is a snowball effect where more requests means more waiting, > and therefore more slowness. This would manifest as sloe response times > without any CPU spike. > > You could also have a slow database and/or some other resource such as a > downstream web service. > > I would investigate those options before trying to prove that fds don't > scale on JVM or Linux (because they likely DO scale quite well). > > > I am not concerned about high open files as I do not see any errors > related > > to open files. Only side effect of open files going above 700 is the > > response from tomcat is slow. I checked if this is caused from elastic > > search, aws cloud watch shows elastic search response is within 5 > > milliseconds. > > > > what might be the reason that when the open files goes beyond 600, it > slows > > down the response time for tomcat. I tried with tomcat 9 and it's the > same > > behavior > > You might want to add some debug logging to your application when > getting ready to contact e.g. a database or remote service. Something like: > > [timestamp] [thread-id] DEBUG Making call to X > [timestamp] [thread-id] DEBUG Completed call to X > > or > > [timestamp] [thread-id] DEBUG Call to X took [duration]ms > > Then have a look at all those logs when the applications slows down and > see if you can observe a significant jump in the time-to-complete those > operations. > > Hope that helps, > -chris > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > -- Sun Certified Enterprise Architect 1.5 Sun Certified Java Programmer 1.4 Microsoft Certified Systems Engineer 2000 http://in.linkedin.com/pub/ayub-khan/a/811/b81 mobile:+966-502674604 -- It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love of God that will put you over the top!!
Re: only for remote access
Jürgen, On 11/12/20 09:50, Jürgen Weber wrote: Chris, it is just authentication basic. I definitely want authentication for remote access, but I had hoped I could override this with a Valve for local access. > Anyway, I'll spare the two apps and do two Servlet mappings /local /remote protect /remote with and check in the servlet code if Servlet Path == local && remote IP in local network You can definitely do that with the RemoteIPValve and/or RemoteIPFilter. No need to write any new code. And I'll try to mod_rewrite /remote to /local if in local network. That would work, but be aware of playing games with URL spaces. It can be a real pain in the neck to hit every case. What's wrong with local users authenticating? I don't trust my network that much. -chris Am Do., 12. Nov. 2020 um 14:43 Uhr schrieb Christopher Schultz : Jürgen, On 11/12/20 06:30, Jürgen Weber wrote: I'd like to have web app security if accessed from outside the local network. if (!local) check Is this possible? with RemoteHostValve ? You cam simulate it, but you can't use in web.xml and also get a "local" carve-out for it. What kind of are you trying to remove? Here are some options: 1. Review why you want to do this in the first place. What makes "local" so special? 2. Deploy two instances of your application, one of which only allows "local" access and does NOT have the in web.xml. 3. Remove the from web.xml completely, and use a Filter/Valve to enforce your security policy. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: only for remote access
Chris, it is just authentication basic. I definitely want authentication for remote access, but I had hoped I could override this with a Valve for local access. Anyway, I'll spare the two apps and do two Servlet mappings /local /remote protect /remote with and check in the servlet code if Servlet Path == local && remote IP in local network And I'll try to mod_rewrite /remote to /local if in local network. Juergen Am Do., 12. Nov. 2020 um 14:43 Uhr schrieb Christopher Schultz : > > Jürgen, > > On 11/12/20 06:30, Jürgen Weber wrote: > > I'd like to have web app security if accessed from outside the local > > network. > > > > if (!local) > > check > > > > > > Is this possible? with RemoteHostValve ? > > You cam simulate it, but you can't use in web.xml > and also get a "local" carve-out for it. > > What kind of are you trying to remove? > > Here are some options: > > 1. Review why you want to do this in the first place. What makes "local" > so special? > > 2. Deploy two instances of your application, one of which only allows > "local" access and does NOT have the in web.xml. > > 3. Remove the from web.xml completely, and use a > Filter/Valve to enforce your security policy. > > -chris > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: Weirdest Tomcat Behavior Ever?
> -Original Message- > From: Mark Thomas > Sent: Thursday, November 12, 2020 4:08 AM > To: Tomcat Users List ; Eric Robinson > > Subject: Re: Weirdest Tomcat Behavior Ever? > > On 11/11/2020 22:48, Eric Robinson wrote: > >> -Original Message- > >> From: Mark Thomas > >> Sent: Monday, November 9, 2020 5:59 AM > >> To: users@tomcat.apache.org > >> Subject: Re: Weirdest Tomcat Behavior Ever? > >> > >> Eric, > >> > >> Time to prune the history and provide another summary I think. This > >> summary isn't complete. There is more information in the history of > >> the thread. I'm trying to focus on what seems to be the key information. > >> > > > > Hi Mark -- So sorry for going silent for a couple of days. Our organization > > is > neck-deep in a huge compliance project. Combine that with this issue we're > working on together, and it's Perfect Storm time around here. We have a big > meeting with the client and vendor tomorrow about all this and I'm working > like heck to prevent this important customer from jumping ship. > > Understood. Let me know if there is anything I can do to help. > > > Now back to it! > > > >> > >> Overview: > >> A small number of requests are receiving a completely empty (no > >> headers, no body) response. > >> > > > > Just a FIN packet and that's all. > > Agreed. > > >> Environment > >> Tomcat 7.0.72 > >> - BIO HTTP (issue also observed with NIO) > >> - Source unknown (probably ASF) > >> Java 1.8.0_221, Oracle > >> CentOS 7.5, Azure > >> Nginx reverse proxy > >> - Using HTTP/1.0 > >> - No keep-alive > >> - No compression > >> No (known) environment changes in the time period where this issue > >> started > > I keep coming back to this. Something triggered this problem (note that > trigger not necessarily the same as root cause). Given that the app, Tomcat > and JVM versions didn't change that again points to some other component. > Perfectly understandable. It's the oldest question in the diagnostic playbook. What changed? I wish I had an answer. Whatever it was, if impacted both upstream servers. > Picking just one of the wild ideas I've had is there some sort of firewall, > IDS, > IPS etc. that might be doing connection tracking and is, for some reason, > getting it wrong and closing the connection in error? > Three is no firewall or IDS software running on the upstreams. The only thing that comes to mind that may have been installed during that timeframe is Sophos antivirus and Solar Winds RMM. Sophos was the first thing I disabled when I saw the packet issues. > As an aside, I mentioned earlier in this thread a similar issue we have been > observing in the CI system. I tracked that down yesterday and I am certain > the issues are unrelated. The CI issue was NIO specific (we see this issue > with > BIO and NIO) and introduced by refactoring in 8.5.x (we see this issue in > 7.0.x). Sorry this doesn't help. > > >> Results from debug logging > >> - The request is read without error > >> - The connection close is initiated from the Tomcat/Java side > >> - The socket is closed before Tomcat tries to write the response > >> - The application is not triggering the close of the socket > >> - Tomcat is not triggering the close of the socket > >> - When Tomcat does try and write we see the following exception > >> java.net.SocketException: Bad file descriptor (Write failed) > >> > >> We have confirmed that the Java process is not hitting the limit for > >> file descriptors. > >> > >> The file descriptor must have been valid when the request was read > >> from the socket. > >> > >> The first debug log shows 2 other active connections from Nginx to > >> Tomcat at the point the connection is closed unexpectedly. > >> > >> The second debug log shows 1 other active connection from Nginx to > >> Tomcat at the point the connection is closed unexpectedly. > >> > >> The third debug log shows 1 other active connection from Nginx to > >> Tomcat at the point the connection is closed unexpectedly. > >> > >> The fourth debug log shows no other active connection from Nginx to > >> Tomcat at the point the connection is closed unexpectedly. > >> > >> > >> Analysis > >> > >> We know the connection close isn't coming from Tomcat or the > application. > >> That leaves: > >> - the JVM > >> - the OS > >> - the virtualisation layer (since this is Azure I am assuming there is > >> one) > >> > >> We are approaching the limit of what we can debug via Tomcat (and my > >> area of expertise. The evidence so far is pointing to an issue lower > >> down the network stack (JVM, OS or virtualisation layer). > >> > > > > Can't disagree with you there. > > > >> I think the next, and possibly last, thing we can do from Tomcat is > >> log some information on the file descriptor associated with the > >> socket. That is going to require some reflection to read JVM internals. > >> > >> Patch files here: > >> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v4/ > >> > >> Source code her
Re: Something I still don't quite understand, Re: Let's Encrypt with Tomcat behind httpd
James, On 11/5/20 12:07, James H. H. Lampert wrote: I'm intrigued by Mr. Schultz's suggestion of Maybe you just want RedirectPermanent instead of Rewrite(Cond|Rule)? Would that make a difference? Or is it just a matter of altering the RewriteCond clause to specifically ignore anything that looks like a Let's Encrypt challenge? Or is there something I can put on the default landing page for the subdomain, rather than in the VirtualHost, to cause the redirection? I'm just thinking that Redirect[*] is a simpler configuration than Rewrite(Cond|Rule). As I recall (unless there's a way to force-expire the cached challenge result on a certbot call), I have to wait until December to run another test. You can delete all your stuff, but LE will get upset if you make requests too frequently. There is a way to ask LE to let you "test" stuff and they will lower the frequency limits. I have forgotten how to do that, but it might be a good idea to look into it since you really are testing things at this point. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Ayub, On 11/11/20 16:16, Ayub Khan wrote: I was load testing using the ec2 load balancer dns. I have increased the connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am not seeing connection timeout in nginx logs now. No errors in kernel.log I am not seeing any errors in tomcat catalina.out. The timeouts are most likely related to the connection timeout (and therefore keepalive) setting. If you are proxying connections from nginx and they should be staying open, you should really never be experiencing a timeout between nginx and Tomcat. During regular operations when the request count is between 4 to 6k requests per minute the open files count for the tomcat process is between 200 to 350. Responses from tomcat are within 5 seconds. Good. If the requests count goes beyond 6.5 k open files slowly move up to 2300 to 3000 and the request responses from tomcat become slow. This is pretty important, here. You are measuring two things: 1. Rise in file descriptor count 2. Application slowness You are assuming that #1 is causing #2. It's entirely possible that #2 is causing #1. The real question is "why is the application slowing down". Do you see CPU spikes? If not, check your db connections. If your db connection pool is fully-utilized (no more available), then you may have lots of request processing threads sitting there waiting on db connections. You'd see a rise in incoming connections (waiting) which aren't making any progress, and the application seems to "slow down", and there is a snowball effect where more requests means more waiting, and therefore more slowness. This would manifest as sloe response times without any CPU spike. You could also have a slow database and/or some other resource such as a downstream web service. I would investigate those options before trying to prove that fds don't scale on JVM or Linux (because they likely DO scale quite well). I am not concerned about high open files as I do not see any errors related to open files. Only side effect of open files going above 700 is the response from tomcat is slow. I checked if this is caused from elastic search, aws cloud watch shows elastic search response is within 5 milliseconds. what might be the reason that when the open files goes beyond 600, it slows down the response time for tomcat. I tried with tomcat 9 and it's the same behavior You might want to add some debug logging to your application when getting ready to contact e.g. a database or remote service. Something like: [timestamp] [thread-id] DEBUG Making call to X [timestamp] [thread-id] DEBUG Completed call to X or [timestamp] [thread-id] DEBUG Call to X took [duration]ms Then have a look at all those logs when the applications slows down and see if you can observe a significant jump in the time-to-complete those operations. Hope that helps, -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Mark, The difference between after_start and after_load is the below sockets which is just a sample from the repeated list, the ports are random. How to know what these connections are related to ? java5021 tomcat8 3162u IPv6 98361 0t0 TCP localhost:http-alt->localhost:51746 (ESTABLISHED) java5021 tomcat8 3163u IPv6 98362 0t0 TCP localhost:http-alt->localhost:51748 (ESTABLISHED) java5021 tomcat8 3164u IPv6 98363 0t0 TCP localhost:http-alt->localhost:51750 (ESTABLISHED) java5021 tomcat8 3165u IPv6 98364 0t0 TCP localhost:http-alt->localhost:51752 (ESTABLISHED) java5021 tomcat8 3166u IPv6 25334 0t0 TCP localhost:http-alt->localhost:51754 (ESTABLISHED) java5021 tomcat8 3167u IPv6 25335 0t0 TCP localhost:http-alt->localhost:51756 (ESTABLISHED) java5021 tomcat8 3168u IPv6 25336 0t0 TCP localhost:http-alt->localhost:51758 (ESTABLISHED) java5021 tomcat8 3169u IPv6 25337 0t0 TCP localhost:http-alt->localhost:51760 (ESTABLISHED) java5021 tomcat8 3170u IPv6 25338 0t0 TCP localhost:http-alt->localhost:51762 (ESTABLISHED) java5021 tomcat8 3171u IPv6 25339 0t0 TCP localhost:http-alt->localhost:51764 (ESTABLISHED) java5021 tomcat8 3172u IPv6 25340 0t0 TCP localhost:http-alt->localhost:51766 (ESTABLISHED) java5021 tomcat8 3173u IPv6 25341 0t0 TCP localhost:http-alt->localhost:51768 (ESTABLISHED) java5021 tomcat8 3174u IPv6 25342 0t0 TCP localhost:http-alt->localhost:51770 (ESTABLISHED) java5021 tomcat8 3175u IPv6 25343 0t0 TCP localhost:http-alt->localhost:51772 (ESTABLISHED) java5021 tomcat8 3176u IPv6 25344 0t0 TCP localhost:http-alt->localhost:51774 (ESTABLISHED) java5021 tomcat8 3177u IPv6 25345 0t0 TCP localhost:http-alt->localhost:51776 (ESTABLISHED) java5021 tomcat8 3178u IPv6 25346 0t0 TCP localhost:http-alt->localhost:51778 (ESTABLISHED) java5021 tomcat8 3179u IPv6 25347 0t0 TCP localhost:http-alt->localhost:51780 (ESTABLISHED) java5021 tomcat8 3180u IPv6 25348 0t0 TCP localhost:http-alt->localhost:51782 (ESTABLISHED) java5021 tomcat8 3181u IPv6 25349 0t0 TCP localhost:http-alt->localhost:51784 (ESTABLISHED) java5021 tomcat8 3182u IPv6 25350 0t0 TCP localhost:http-alt->localhost:51786 (ESTABLISHED) java5021 tomcat8 3183u IPv6 25351 0t0 TCP localhost:http-alt->localhost:51788 (ESTABLISHED) On Thu, Nov 12, 2020 at 4:05 PM Martin Grigorov wrote: > On Thu, Nov 12, 2020 at 2:40 PM Ayub Khan wrote: > > > Martin, > > > > Could you provide me a command which you want me to run and provide you > the > > results which might help you to debug this issue ? > > > > 1) start your app and click around to load the usual FDs > 2) lsof -p `cat /var/run/tomcat8.pid` > after_start.txt > 3) load your app > 4) lsof -p `cat /var/run/tomcat8.pid` > after_load.txt > > you can analyze the differences in the files yourself before sending them > to us :-) > > > > > > > > On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov > > wrote: > > > > > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan wrote: > > > > > > > Martin, > > > > > > > > These are file descriptors, some are related to the jar files which > are > > > > included in the web application and some are related to the sockets > > from > > > > nginx to tomcat and some are related to database connections. I use > the > > > > below command to count the open file descriptors > > > > > > > > > > which type of connections increase ? > > > the sockets ? the DB ones ? > > > > > > > > > > > > > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" > > > > > > > > > > you can also use lsof command > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov < > mgrigo...@apache.org > > > > > > > wrote: > > > > > > > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan > > wrote: > > > > > > > > > > > Chris, > > > > > > > > > > > > I was load testing using the ec2 load balancer dns. I have > > increased > > > > the > > > > > > connector timeout to 6000 and also gave 32gig to the JVM of > > tomcat. I > > > > am > > > > > > not seeing connection timeout in nginx logs now. No errors in > > > > kernel.log > > > > > I > > > > > > am not seeing any errors in tomcat catalina.out. > > > > > > During regular operations when the request count is between 4 to > 6k > > > > > > requests per minute the open files count for the tomcat process > is > > > > > between > > > > > > 200 to 350. Responses from tom
Re: only for remote access
Jürgen, On 11/12/20 06:30, Jürgen Weber wrote: I'd like to have web app security if accessed from outside the local network. if (!local) check Is this possible? with RemoteHostValve ? You cam simulate it, but you can't use in web.xml and also get a "local" carve-out for it. What kind of are you trying to remove? Here are some options: 1. Review why you want to do this in the first place. What makes "local" so special? 2. Deploy two instances of your application, one of which only allows "local" access and does NOT have the in web.xml. 3. Remove the from web.xml completely, and use a Filter/Valve to enforce your security policy. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
On Thu, Nov 12, 2020 at 2:40 PM Ayub Khan wrote: > Martin, > > Could you provide me a command which you want me to run and provide you the > results which might help you to debug this issue ? > 1) start your app and click around to load the usual FDs 2) lsof -p `cat /var/run/tomcat8.pid` > after_start.txt 3) load your app 4) lsof -p `cat /var/run/tomcat8.pid` > after_load.txt you can analyze the differences in the files yourself before sending them to us :-) > > > On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov > wrote: > > > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan wrote: > > > > > Martin, > > > > > > These are file descriptors, some are related to the jar files which are > > > included in the web application and some are related to the sockets > from > > > nginx to tomcat and some are related to database connections. I use the > > > below command to count the open file descriptors > > > > > > > which type of connections increase ? > > the sockets ? the DB ones ? > > > > > > > > > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" > > > > > > > you can also use lsof command > > > > > > > > > > > > > > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov > > > > wrote: > > > > > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan > wrote: > > > > > > > > > Chris, > > > > > > > > > > I was load testing using the ec2 load balancer dns. I have > increased > > > the > > > > > connector timeout to 6000 and also gave 32gig to the JVM of > tomcat. I > > > am > > > > > not seeing connection timeout in nginx logs now. No errors in > > > kernel.log > > > > I > > > > > am not seeing any errors in tomcat catalina.out. > > > > > During regular operations when the request count is between 4 to 6k > > > > > requests per minute the open files count for the tomcat process is > > > > between > > > > > 200 to 350. Responses from tomcat are within 5 seconds. > > > > > If the requests count goes beyond 6.5 k open files slowly move up > to > > > > 2300 > > > > > to 3000 and the request responses from tomcat become slow. > > > > > > > > > > I am not concerned about high open files as I do not see any errors > > > > related > > > > > to open files. Only side effect of open files going above 700 is > the > > > > > response from tomcat is slow. I checked if this is caused from > > elastic > > > > > search, aws cloud watch shows elastic search response is within 5 > > > > > milliseconds. > > > > > > > > > > what might be the reason that when the open files goes beyond 600, > it > > > > slows > > > > > down the response time for tomcat. I tried with tomcat 9 and it's > the > > > > same > > > > > behavior > > > > > > > > > > > > > Do you know what kind of files are being opened ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz < > > > > > ch...@christopherschultz.net> wrote: > > > > > > > > > > > Ayub, > > > > > > > > > > > > On 11/3/20 10:56, Ayub Khan wrote: > > > > > > > *I'm curious about why you are using all of cloudflare and ALB > > and > > > > > > > nginx.Seems like any one of those could provide what you are > > > getting > > > > > from > > > > > > > all3 of them. * > > > > > > > > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl > > termination > > > > > > > > > > > > What do you mean "Cloudflare is doing just the DNS?" > > > > > > > > > > > > So what is ALB doing, then? > > > > > > > > > > > > > *What is the maximum number of simultaneous requests that one > > > > > > nginxinstance > > > > > > > will accept? What is the maximum number of simultaneous > > > > proxiedrequests > > > > > > one > > > > > > > nginx instance will make to a back-end Tomcat node? Howmany > nginx > > > > nodes > > > > > > do > > > > > > > you have? How many Tomcat nodes? * > > > > > > > > > > > > > > We have 4 vms each having nginx and tomcat running on them and > > each > > > > > > tomcat > > > > > > > has nginx in front of them to proxy the requests. So it's one > > Nginx > > > > > > > proxying to a dedicated tomcat on the same VM. > > > > > > > > > > > > Okay. > > > > > > > > > > > > > below is the tomcat connector configuration > > > > > > > > > > > > > > > > > > > > connectionTimeout="6" maxThreads="2000" > > > > > > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > > > > URIEncoding="UTF-8" > > > > > > > redirectPort="8443" /> > > > > > > > > > > > > 60 seconds is a *long* time for a connection timeout. > > > > > > > > > > > > Do you actually need 2000 threads? That's a lot, though not > insane. > > > > 2000 > > > > > > threads means you expect to handle 2000 concurrent (non-async, > > > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you > > > > expecting > > > > > > 8000 concurrent requests? Does your load-balancer understand the > > > > > > topography and current-load on any given node? > > > > > > > > > > > > > When I am doi
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Martin, Could you provide me a command which you want me to run and provide you the results which might help you to debug this issue ? On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov wrote: > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan wrote: > > > Martin, > > > > These are file descriptors, some are related to the jar files which are > > included in the web application and some are related to the sockets from > > nginx to tomcat and some are related to database connections. I use the > > below command to count the open file descriptors > > > > which type of connections increase ? > the sockets ? the DB ones ? > > > > > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" > > > > you can also use lsof command > > > > > > > > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov > > wrote: > > > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan wrote: > > > > > > > Chris, > > > > > > > > I was load testing using the ec2 load balancer dns. I have increased > > the > > > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I > > am > > > > not seeing connection timeout in nginx logs now. No errors in > > kernel.log > > > I > > > > am not seeing any errors in tomcat catalina.out. > > > > During regular operations when the request count is between 4 to 6k > > > > requests per minute the open files count for the tomcat process is > > > between > > > > 200 to 350. Responses from tomcat are within 5 seconds. > > > > If the requests count goes beyond 6.5 k open files slowly move up to > > > 2300 > > > > to 3000 and the request responses from tomcat become slow. > > > > > > > > I am not concerned about high open files as I do not see any errors > > > related > > > > to open files. Only side effect of open files going above 700 is the > > > > response from tomcat is slow. I checked if this is caused from > elastic > > > > search, aws cloud watch shows elastic search response is within 5 > > > > milliseconds. > > > > > > > > what might be the reason that when the open files goes beyond 600, it > > > slows > > > > down the response time for tomcat. I tried with tomcat 9 and it's the > > > same > > > > behavior > > > > > > > > > > Do you know what kind of files are being opened ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz < > > > > ch...@christopherschultz.net> wrote: > > > > > > > > > Ayub, > > > > > > > > > > On 11/3/20 10:56, Ayub Khan wrote: > > > > > > *I'm curious about why you are using all of cloudflare and ALB > and > > > > > > nginx.Seems like any one of those could provide what you are > > getting > > > > from > > > > > > all3 of them. * > > > > > > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl > termination > > > > > > > > > > What do you mean "Cloudflare is doing just the DNS?" > > > > > > > > > > So what is ALB doing, then? > > > > > > > > > > > *What is the maximum number of simultaneous requests that one > > > > > nginxinstance > > > > > > will accept? What is the maximum number of simultaneous > > > proxiedrequests > > > > > one > > > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx > > > nodes > > > > > do > > > > > > you have? How many Tomcat nodes? * > > > > > > > > > > > > We have 4 vms each having nginx and tomcat running on them and > each > > > > > tomcat > > > > > > has nginx in front of them to proxy the requests. So it's one > Nginx > > > > > > proxying to a dedicated tomcat on the same VM. > > > > > > > > > > Okay. > > > > > > > > > > > below is the tomcat connector configuration > > > > > > > > > > > > > > > > > connectionTimeout="6" maxThreads="2000" > > > > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > > > URIEncoding="UTF-8" > > > > > > redirectPort="8443" /> > > > > > > > > > > 60 seconds is a *long* time for a connection timeout. > > > > > > > > > > Do you actually need 2000 threads? That's a lot, though not insane. > > > 2000 > > > > > threads means you expect to handle 2000 concurrent (non-async, > > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you > > > expecting > > > > > 8000 concurrent requests? Does your load-balancer understand the > > > > > topography and current-load on any given node? > > > > > > > > > > > When I am doing a load test of 2000 concurrent users I see the > open > > > > files > > > > > > increase to 10,320 and when I take thread dump I see the threads > > are > > > > in a > > > > > > waiting state.Slowly as the requests are completed I see the open > > > files > > > > > > come down to normal levels. > > > > > > > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat > > > stack, > > > > > or just hitting Tomcat (or nginx) directly? > > > > > > > > > > Are you using HTTP keepalive in your load-test (from the client to > > > > > whichever server is being contacted)? > > > > > > > > > > > The output o
only for remote access
Hi, I'd like to have web app security if accessed from outside the local network. if (!local) check Is this possible? with RemoteHostValve ? Thx, Juergen - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan wrote: > Martin, > > These are file descriptors, some are related to the jar files which are > included in the web application and some are related to the sockets from > nginx to tomcat and some are related to database connections. I use the > below command to count the open file descriptors > which type of connections increase ? the sockets ? the DB ones ? > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" > you can also use lsof command > > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov > wrote: > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan wrote: > > > > > Chris, > > > > > > I was load testing using the ec2 load balancer dns. I have increased > the > > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I > am > > > not seeing connection timeout in nginx logs now. No errors in > kernel.log > > I > > > am not seeing any errors in tomcat catalina.out. > > > During regular operations when the request count is between 4 to 6k > > > requests per minute the open files count for the tomcat process is > > between > > > 200 to 350. Responses from tomcat are within 5 seconds. > > > If the requests count goes beyond 6.5 k open files slowly move up to > > 2300 > > > to 3000 and the request responses from tomcat become slow. > > > > > > I am not concerned about high open files as I do not see any errors > > related > > > to open files. Only side effect of open files going above 700 is the > > > response from tomcat is slow. I checked if this is caused from elastic > > > search, aws cloud watch shows elastic search response is within 5 > > > milliseconds. > > > > > > what might be the reason that when the open files goes beyond 600, it > > slows > > > down the response time for tomcat. I tried with tomcat 9 and it's the > > same > > > behavior > > > > > > > Do you know what kind of files are being opened ? > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz < > > > ch...@christopherschultz.net> wrote: > > > > > > > Ayub, > > > > > > > > On 11/3/20 10:56, Ayub Khan wrote: > > > > > *I'm curious about why you are using all of cloudflare and ALB and > > > > > nginx.Seems like any one of those could provide what you are > getting > > > from > > > > > all3 of them. * > > > > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl termination > > > > > > > > What do you mean "Cloudflare is doing just the DNS?" > > > > > > > > So what is ALB doing, then? > > > > > > > > > *What is the maximum number of simultaneous requests that one > > > > nginxinstance > > > > > will accept? What is the maximum number of simultaneous > > proxiedrequests > > > > one > > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx > > nodes > > > > do > > > > > you have? How many Tomcat nodes? * > > > > > > > > > > We have 4 vms each having nginx and tomcat running on them and each > > > > tomcat > > > > > has nginx in front of them to proxy the requests. So it's one Nginx > > > > > proxying to a dedicated tomcat on the same VM. > > > > > > > > Okay. > > > > > > > > > below is the tomcat connector configuration > > > > > > > > > > > > > > connectionTimeout="6" maxThreads="2000" > > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > > URIEncoding="UTF-8" > > > > > redirectPort="8443" /> > > > > > > > > 60 seconds is a *long* time for a connection timeout. > > > > > > > > Do you actually need 2000 threads? That's a lot, though not insane. > > 2000 > > > > threads means you expect to handle 2000 concurrent (non-async, > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you > > expecting > > > > 8000 concurrent requests? Does your load-balancer understand the > > > > topography and current-load on any given node? > > > > > > > > > When I am doing a load test of 2000 concurrent users I see the open > > > files > > > > > increase to 10,320 and when I take thread dump I see the threads > are > > > in a > > > > > waiting state.Slowly as the requests are completed I see the open > > files > > > > > come down to normal levels. > > > > > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat > > stack, > > > > or just hitting Tomcat (or nginx) directly? > > > > > > > > Are you using HTTP keepalive in your load-test (from the client to > > > > whichever server is being contacted)? > > > > > > > > > The output of the below command is > > > > > sudo cat /proc/sys/kernel/pid_max > > > > > 131072 > > > > > > > > > > I am testing this on a c4.8xlarge VM in AWS. > > > > > > > > > > below is the config I changed in nginx.conf file > > > > > > > > > > events { > > > > > worker_connections 5; > > > > > # multi_accept on; > > > > > } > > > > > > > > This will allow 50k incoming connections, and Tomcat will accept an > > > > unbounded number of connections (for NIO connect
Re: Weirdest Tomcat Behavior Ever?
On 11/11/2020 22:48, Eric Robinson wrote: >> -Original Message- >> From: Mark Thomas >> Sent: Monday, November 9, 2020 5:59 AM >> To: users@tomcat.apache.org >> Subject: Re: Weirdest Tomcat Behavior Ever? >> >> Eric, >> >> Time to prune the history and provide another summary I think. This >> summary isn't complete. There is more information in the history of the >> thread. I'm trying to focus on what seems to be the key information. >> > > Hi Mark -- So sorry for going silent for a couple of days. Our organization > is neck-deep in a huge compliance project. Combine that with this issue we're > working on together, and it's Perfect Storm time around here. We have a big > meeting with the client and vendor tomorrow about all this and I'm working > like heck to prevent this important customer from jumping ship. Understood. Let me know if there is anything I can do to help. > Now back to it! > >> >> Overview: >> A small number of requests are receiving a completely empty (no headers, >> no body) response. >> > > Just a FIN packet and that's all. Agreed. >> Environment >> Tomcat 7.0.72 >> - BIO HTTP (issue also observed with NIO) >> - Source unknown (probably ASF) >> Java 1.8.0_221, Oracle >> CentOS 7.5, Azure >> Nginx reverse proxy >> - Using HTTP/1.0 >> - No keep-alive >> - No compression >> No (known) environment changes in the time period where this issue started I keep coming back to this. Something triggered this problem (note that trigger not necessarily the same as root cause). Given that the app, Tomcat and JVM versions didn't change that again points to some other component. Picking just one of the wild ideas I've had is there some sort of firewall, IDS, IPS etc. that might be doing connection tracking and is, for some reason, getting it wrong and closing the connection in error? As an aside, I mentioned earlier in this thread a similar issue we have been observing in the CI system. I tracked that down yesterday and I am certain the issues are unrelated. The CI issue was NIO specific (we see this issue with BIO and NIO) and introduced by refactoring in 8.5.x (we see this issue in 7.0.x). Sorry this doesn't help. >> Results from debug logging >> - The request is read without error >> - The connection close is initiated from the Tomcat/Java side >> - The socket is closed before Tomcat tries to write the response >> - The application is not triggering the close of the socket >> - Tomcat is not triggering the close of the socket >> - When Tomcat does try and write we see the following exception >> java.net.SocketException: Bad file descriptor (Write failed) >> >> We have confirmed that the Java process is not hitting the limit for file >> descriptors. >> >> The file descriptor must have been valid when the request was read from >> the socket. >> >> The first debug log shows 2 other active connections from Nginx to Tomcat at >> the point the connection is closed unexpectedly. >> >> The second debug log shows 1 other active connection from Nginx to Tomcat >> at the point the connection is closed unexpectedly. >> >> The third debug log shows 1 other active connection from Nginx to Tomcat at >> the point the connection is closed unexpectedly. >> >> The fourth debug log shows no other active connection from Nginx to >> Tomcat at the point the connection is closed unexpectedly. >> >> >> Analysis >> >> We know the connection close isn't coming from Tomcat or the application. >> That leaves: >> - the JVM >> - the OS >> - the virtualisation layer (since this is Azure I am assuming there is >> one) >> >> We are approaching the limit of what we can debug via Tomcat (and my area >> of expertise. The evidence so far is pointing to an issue lower down the >> network stack (JVM, OS or virtualisation layer). >> > > Can't disagree with you there. > >> I think the next, and possibly last, thing we can do from Tomcat is log some >> information on the file descriptor associated with the socket. That is going >> to >> require some reflection to read JVM internals. >> >> Patch files here: >> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v4/ >> >> Source code here: >> https://github.com/markt-asf/tomcat/tree/debug-7.0.72 >> > > I will apply these tonight. > >> The file descriptor usage count is guarded by a lock object so this patch >> adds >> quite a few syncs. For the load you are seeing that shouldn't an issue but >> there is a change it will impact performance. >> > > Based on observation of load, I'm not too concerned about that. Maybe a > little. I'll keep an eye on it. > >> The aim with this logging is to provide evidence of whether or not there is a >> file descriptor handling problem in the JRE. My expectation is that with >> these >> logs we will have reached the limit of what we can do with Tomcat but will be >> able to point you in the right direction for further investigation. >> > > I'll get this done right away. Thanks. Mark --
Re: Timeout waiting to read data from client
On 11/11/2020 22:32, Jerry Malcolm wrote: > On 11/9/2020 11:05 AM, Jerry Malcolm wrote: >> >> On 11/9/2020 3:10 AM, Mark Thomas wrote: >>> On 08/11/2020 01:33, Jerry Malcolm wrote: On 11/7/2020 6:56 PM, Christopher Schultz wrote: > Jerry, > > On 11/6/20 19:49, Jerry Malcolm wrote: >> I have a relatively new environment with a standalone tomcat (8.5) >> running on an AWS Linux2 EC2. I'm not using HTTPD/AJP. Its a direct >> connection to port 443. (Well technically, I have firewallD in the >> flow in order to route the protected port 80 to port 8080 and 443 to >> 8443 for TC). >> >> I am doing some stress testing on the server and failing miserably. >> I am sending around 130 ajax calls in rapid succession using HTTP/2. >> These are all very simple small page (JSP) requests. Not a lot of >> processing required. The first ~75 requests process normally. Then >> everything hangs up. In the tomcat logs I'm getting a bunch of >> "Timeout waiting to read data from client" exceptions. And in the >> stacktrace for these exceptions, they are all occurring when I'm >> trying to access a parameter from the request. Looking at the >> request network timing in the browser console, I see a bunch of the >> requests returning in typical time of a few milliseconds. Then >> another large block of requests that all start returning around 4 >> seconds, then another block that wait until 8 seconds to return. >> I've tried firefox and chrome with the same results. >> >> I've been using httpd in front of TC for years. So this is the first >> time I'm running TC standalone. It is very likely I've got some >> parameters set horribly wrong. But I have no clue where to start. >> This is not a tiny EC2, and my internet connection is not showing any >> signs of problems. So I really don't think this is a >> performance-related problem. The problem is very consistent and >> reproducible with the same counts of success/failure calls. What >> could be causing the "Timeout waiting to read data from client" after >> 75 calls, and then cause blocks of calls to wait 4 seconds, 8 >> seconds, etc before responding? I really need to handle more >> simultaneous load that this is currently allowing. >> >> Thanks in advance for the education. > Are you using HTTP Keepalives on your connections? Are you actually > re-using those connections in your test? What is your keepalive > timeout on your . Actually, what is your whole > configuration? > > -chris > Hi Chris, here are my two connector definitions from server.xml: >>> port="8080" protocol="HTTP/1.1" connectionTimeout="2" redirectPort="443" /> >>> port="8443" maxThreads="150" connectionTimeout="2" SSLEnabled="true" scheme="https" secure="true" clientAuth="false" SSLCertificateFile="ssl/a.com/cert.pem" SSLCertificateChainFile="ssl/a.com/chain.pem" SSLCertificateKeyFile="ssl/a.com/privkey.pem"> >>> className="org.apache.coyote.http2.Http2Protocol" /> >>> How are you stress testing this? All on a single HTTP/2 connection or >>> multiple connections? With which tool? >>> >>> You might want to test HTTP/1.1 requests (with and without TLS) to see >>> if the problem is specific to HTTP/2 or TLS as that should help narrow >>> down the root cause. >>> >>> Mark >> >> Hi Mark, technically it's not a 'designed' stress test. It's real >> production code that just happens to stress the server more than >> usual. It's just a page that makes a bunch of ajax calls, and the >> responses to each of those issue a second ajax call. >> >> If you don't see anything obvious in my configuration, we will >> definitely pursue the http/1.1 options, etc. I just wanted to >> eliminate the chance of obvious 'pilot error' before digging deeper. >> >> Specifically, where is that error detected in the TC flow? In my logs >> it fails on getting request parameters. It sounds like the input >> reader for the request is getting blocked. But the first part of >> the request is getting in since it does route to the appropriate JSP. >> Just seems strange that the http/2 or ssl layers would let half of the >> request in and then block the rest of the request. The browser >> appears to be sending everything. And it fails the same using firefox >> or chrome. Any ideas? >> >> Thx >> >> > Update on this. One of our clients got ERR_HTTP2_SERVER_REFUSED_STREAM > after things locked up. I removed the http2 'upgrade protocol' line > from my connector, and everything works. So it's apparently something > wrong with my http2 setup. Ideas? (See my connector config above in > this thread). Tomcat only issues that e
Re: NGINX + tomcat 8.0.35 (110: Connection timed out)
Martin, These are file descriptors, some are related to the jar files which are included in the web application and some are related to the sockets from nginx to tomcat and some are related to database connections. I use the below command to count the open file descriptors watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l" On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov wrote: > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan wrote: > > > Chris, > > > > I was load testing using the ec2 load balancer dns. I have increased the > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am > > not seeing connection timeout in nginx logs now. No errors in kernel.log > I > > am not seeing any errors in tomcat catalina.out. > > During regular operations when the request count is between 4 to 6k > > requests per minute the open files count for the tomcat process is > between > > 200 to 350. Responses from tomcat are within 5 seconds. > > If the requests count goes beyond 6.5 k open files slowly move up to > 2300 > > to 3000 and the request responses from tomcat become slow. > > > > I am not concerned about high open files as I do not see any errors > related > > to open files. Only side effect of open files going above 700 is the > > response from tomcat is slow. I checked if this is caused from elastic > > search, aws cloud watch shows elastic search response is within 5 > > milliseconds. > > > > what might be the reason that when the open files goes beyond 600, it > slows > > down the response time for tomcat. I tried with tomcat 9 and it's the > same > > behavior > > > > Do you know what kind of files are being opened ? > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz < > > ch...@christopherschultz.net> wrote: > > > > > Ayub, > > > > > > On 11/3/20 10:56, Ayub Khan wrote: > > > > *I'm curious about why you are using all of cloudflare and ALB and > > > > nginx.Seems like any one of those could provide what you are getting > > from > > > > all3 of them. * > > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl termination > > > > > > What do you mean "Cloudflare is doing just the DNS?" > > > > > > So what is ALB doing, then? > > > > > > > *What is the maximum number of simultaneous requests that one > > > nginxinstance > > > > will accept? What is the maximum number of simultaneous > proxiedrequests > > > one > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx > nodes > > > do > > > > you have? How many Tomcat nodes? * > > > > > > > > We have 4 vms each having nginx and tomcat running on them and each > > > tomcat > > > > has nginx in front of them to proxy the requests. So it's one Nginx > > > > proxying to a dedicated tomcat on the same VM. > > > > > > Okay. > > > > > > > below is the tomcat connector configuration > > > > > > > > > > > connectionTimeout="6" maxThreads="2000" > > > > protocol="org.apache.coyote.http11.Http11NioProtocol" > > > > URIEncoding="UTF-8" > > > > redirectPort="8443" /> > > > > > > 60 seconds is a *long* time for a connection timeout. > > > > > > Do you actually need 2000 threads? That's a lot, though not insane. > 2000 > > > threads means you expect to handle 2000 concurrent (non-async, > > > non-Wewbsocket) requests. Do you need that (per node)? Are you > expecting > > > 8000 concurrent requests? Does your load-balancer understand the > > > topography and current-load on any given node? > > > > > > > When I am doing a load test of 2000 concurrent users I see the open > > files > > > > increase to 10,320 and when I take thread dump I see the threads are > > in a > > > > waiting state.Slowly as the requests are completed I see the open > files > > > > come down to normal levels. > > > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat > stack, > > > or just hitting Tomcat (or nginx) directly? > > > > > > Are you using HTTP keepalive in your load-test (from the client to > > > whichever server is being contacted)? > > > > > > > The output of the below command is > > > > sudo cat /proc/sys/kernel/pid_max > > > > 131072 > > > > > > > > I am testing this on a c4.8xlarge VM in AWS. > > > > > > > > below is the config I changed in nginx.conf file > > > > > > > > events { > > > > worker_connections 5; > > > > # multi_accept on; > > > > } > > > > > > This will allow 50k incoming connections, and Tomcat will accept an > > > unbounded number of connections (for NIO connector). So limiting your > > > threads to 2000 only means that the work of each request will be done > in > > > groups of 2000. > > > > > > > worker_rlimit_nofile 3; > > > > > > I'm not sure how many connections are handled by a single nginx worker. > > > If you accept 50k connections and only allow 30k file handles, you may > > > have a problem if that's all being done by a single worker. > > > > > > > What would b