RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Eric Robinson
Chris,

> -Original Message-
> From: Christopher Schultz 
> Sent: Friday, May 31, 2024 12:50 PM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Eric,
>
> On 5/31/24 13:44, Eric Robinson wrote:
> >> -Original Message-
> >> From: Christopher Schultz 
> >> Sent: Friday, May 31, 2024 12:38 PM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Database Connection Requests Initiated but Not Sent on
> >> the Wire (Some, Not All)
> >>
> >> Mark,
> >>
> >> On 5/31/24 12:44, Mark Thomas wrote:
> >>> On 31/05/2024 16:09, Eric Robinson wrote:
>  The results are looking great so far.
> >>>
> >>> Excellent.
> >>>
>  Here's what we know:
> 
>  Before the patch, we had 2 load-balanced tomcats in production for
>  this customer. Due to the driver search bottleneck, we were seeing
>  hundreds of stuck threads during the slowdown periods. To work
>  around this problem, we threw more tomcats at it. With 6 tomcats,
>  the load was spread around enough to keep the bottleneck condition
>  from manifesting badly, and users did not complain as much. We were
>  still seeing dozens of stuck threads, but not hundreds.
> 
>  After the patch, we went back to 2 tomcats.
> >>>
> >>> I appreciate the show of faith! I think that is braver than I would
> >>> have been but it does rather confirm both the problem and the fix.
> >>>
>  During the same timeframe today, there have been 1 stuck thread on
>  Tomcat A and 6 on Tomcat B.
> >>>
> >>> That is great news.
> >>>
>  If the numbers hold, this works out to roughly a 10,000% improvement.
> >>>
> >>> Not bad for free support ;)
> >>>
> >>> Seriously, I am glad that we seem to have tracked down the root
> >>> cause and that you have a temporary fix that works until such time
> >>> (probably the July releases) that we can figure out how we want to
> >>> address caching of "not found" classes.
> >>
> >> Yeah... this doesn't seem like a great default policy for a few reasons:
> >>
> >> 1. Maybe the classes will appear in the future? JSPs? Plugins that
> >> speculatively- load, then fail, then download/update, then try again?
> >> I'm grasping at straws a little, here.
> >>
> >> 2. Huge numbers of cache misses will cause huge numbers of cached
> >> "not found" entries. Potential DOS? I guess that would be an
> >> application bug if it's allowing huge numbers of random class-loading
> >> requests. Again, grasping at straws.
> >>
> >
> > Would it, though? I don't know what a negative cache entry would look like,
> but it seems to me that it would not have to create duplicates.
>
> I was thinking of cache entries for large numbers of different classes, all of
> which were "not found". Not one class being requested over and over again. 
> It's
> just lots of String keys in a hash map or whatever. Not horrific, but can get 
> out
> of control of Something Goes Wrong.
>

Gotcha. In that case, if there were hundreds of different classes not found, 
then it seems the app must have bigger problems, like entirely missing lib 
folders or something, and the system would barely function, if at all.  I 
realize I'm speaking largely from ignorance here.

> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Terence M. Bandoian


On 5/31/2024 11:44 AM, Mark Thomas wrote:

On 31/05/2024 16:09, Eric Robinson wrote:

The results are looking great so far.


Excellent.


Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for 
this customer. Due to the driver search bottleneck, we were seeing 
hundreds of stuck threads during the slowdown periods. To work around 
this problem, we threw more tomcats at it. With 6 tomcats, the load 
was spread around enough to keep the bottleneck condition from 
manifesting badly, and users did not complain as much. We were still 
seeing dozens of stuck threads, but not hundreds.


After the patch, we went back to 2 tomcats.


I appreciate the show of faith! I think that is braver than I would 
have been but it does rather confirm both the problem and the fix.


During the same timeframe today, there have been 1 stuck thread on 
Tomcat A and 6 on Tomcat B.


That is great news.


If the numbers hold, this works out to roughly a 10,000% improvement.


Not bad for free support ;)

Seriously, I am glad that we seem to have tracked down the root cause 
and that you have a temporary fix that works until such time (probably 
the July releases) that we can figure out how we want to address 
caching of "not found" classes.


Cheers,

Mark


Kudos!

Would this mean that a "not found" class would be forever "not found" 
until a restart? How does Tomcat handle jars or classes added or removed 
while it's running now? Could a class be removed from the "not found" 
cache if a change was detected? Would it be worth maintaining a list of 
all available classes?


Really well done!

-Terence

Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Christopher Schultz

Eric,

On 5/31/24 13:44, Eric Robinson wrote:

-Original Message-
From: Christopher Schultz 
Sent: Friday, May 31, 2024 12:38 PM
To: users@tomcat.apache.org
Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
(Some, Not All)

Mark,

On 5/31/24 12:44, Mark Thomas wrote:

On 31/05/2024 16:09, Eric Robinson wrote:

The results are looking great so far.


Excellent.


Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for
this customer. Due to the driver search bottleneck, we were seeing
hundreds of stuck threads during the slowdown periods. To work around
this problem, we threw more tomcats at it. With 6 tomcats, the load
was spread around enough to keep the bottleneck condition from
manifesting badly, and users did not complain as much. We were still
seeing dozens of stuck threads, but not hundreds.

After the patch, we went back to 2 tomcats.


I appreciate the show of faith! I think that is braver than I would
have been but it does rather confirm both the problem and the fix.


During the same timeframe today, there have been 1 stuck thread on
Tomcat A and 6 on Tomcat B.


That is great news.


If the numbers hold, this works out to roughly a 10,000% improvement.


Not bad for free support ;)

Seriously, I am glad that we seem to have tracked down the root cause
and that you have a temporary fix that works until such time (probably
the July releases) that we can figure out how we want to address
caching of "not found" classes.


Yeah... this doesn't seem like a great default policy for a few reasons:

1. Maybe the classes will appear in the future? JSPs? Plugins that 
speculatively-
load, then fail, then download/update, then try again? I'm grasping at straws a
little, here.

2. Huge numbers of cache misses will cause huge numbers of cached "not
found" entries. Potential DOS? I guess that would be an application bug if it's
allowing huge numbers of random class-loading requests. Again, grasping at
straws.



Would it, though? I don't know what a negative cache entry would look like, but 
it seems to me that it would not have to create duplicates.


I was thinking of cache entries for large numbers of different classes, 
all of which were "not found". Not one class being requested over and 
over again. It's just lots of String keys in a hash map or whatever. Not 
horrific, but can get out of control of Something Goes Wrong.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Eric Robinson
> -Original Message-
> From: Christopher Schultz 
> Sent: Friday, May 31, 2024 12:38 PM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Mark,
>
> On 5/31/24 12:44, Mark Thomas wrote:
> > On 31/05/2024 16:09, Eric Robinson wrote:
> >> The results are looking great so far.
> >
> > Excellent.
> >
> >> Here's what we know:
> >>
> >> Before the patch, we had 2 load-balanced tomcats in production for
> >> this customer. Due to the driver search bottleneck, we were seeing
> >> hundreds of stuck threads during the slowdown periods. To work around
> >> this problem, we threw more tomcats at it. With 6 tomcats, the load
> >> was spread around enough to keep the bottleneck condition from
> >> manifesting badly, and users did not complain as much. We were still
> >> seeing dozens of stuck threads, but not hundreds.
> >>
> >> After the patch, we went back to 2 tomcats.
> >
> > I appreciate the show of faith! I think that is braver than I would
> > have been but it does rather confirm both the problem and the fix.
> >
> >> During the same timeframe today, there have been 1 stuck thread on
> >> Tomcat A and 6 on Tomcat B.
> >
> > That is great news.
> >
> >> If the numbers hold, this works out to roughly a 10,000% improvement.
> >
> > Not bad for free support ;)
> >
> > Seriously, I am glad that we seem to have tracked down the root cause
> > and that you have a temporary fix that works until such time (probably
> > the July releases) that we can figure out how we want to address
> > caching of "not found" classes.
>
> Yeah... this doesn't seem like a great default policy for a few reasons:
>
> 1. Maybe the classes will appear in the future? JSPs? Plugins that 
> speculatively-
> load, then fail, then download/update, then try again? I'm grasping at straws 
> a
> little, here.
>
> 2. Huge numbers of cache misses will cause huge numbers of cached "not
> found" entries. Potential DOS? I guess that would be an application bug if 
> it's
> allowing huge numbers of random class-loading requests. Again, grasping at
> straws.
>

Would it, though? I don't know what a negative cache entry would look like, but 
it seems to me that it would not have to create duplicates.

> But my spidey-sense it tingling at this one.
>
> Maybe (a) default-off option and (b) limit the total number of cached "not
> found" items to something "smallish" like 100? 1000? 1? Just enough to not
> fill the heap if something goes terribly wrong.
>
> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Christopher Schultz

Mark,

On 5/31/24 12:44, Mark Thomas wrote:

On 31/05/2024 16:09, Eric Robinson wrote:

The results are looking great so far.


Excellent.


Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for 
this customer. Due to the driver search bottleneck, we were seeing 
hundreds of stuck threads during the slowdown periods. To work around 
this problem, we threw more tomcats at it. With 6 tomcats, the load 
was spread around enough to keep the bottleneck condition from 
manifesting badly, and users did not complain as much. We were still 
seeing dozens of stuck threads, but not hundreds.


After the patch, we went back to 2 tomcats.


I appreciate the show of faith! I think that is braver than I would have 
been but it does rather confirm both the problem and the fix.


During the same timeframe today, there have been 1 stuck thread on 
Tomcat A and 6 on Tomcat B.


That is great news.


If the numbers hold, this works out to roughly a 10,000% improvement.


Not bad for free support ;)

Seriously, I am glad that we seem to have tracked down the root cause 
and that you have a temporary fix that works until such time (probably 
the July releases) that we can figure out how we want to address caching 
of "not found" classes.


Yeah... this doesn't seem like a great default policy for a few reasons:

1. Maybe the classes will appear in the future? JSPs? Plugins that 
speculatively-load, then fail, then download/update, then try again? I'm 
grasping at straws a little, here.


2. Huge numbers of cache misses will cause huge numbers of cached "not 
found" entries. Potential DOS? I guess that would be an application bug 
if it's allowing huge numbers of random class-loading requests. Again, 
grasping at straws.


But my spidey-sense it tingling at this one.

Maybe (a) default-off option and (b) limit the total number of cached 
"not found" items to something "smallish" like 100? 1000? 1? Just 
enough to not fill the heap if something goes terribly wrong.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Christopher Schultz

Eric,

On 5/31/24 11:09, Eric Robinson wrote:

The results are looking great so far.

Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for this 
customer. Due to the driver search bottleneck, we were seeing hundreds of stuck 
threads during the slowdown periods. To work around this problem, we threw more 
tomcats at it. With 6 tomcats, the load was spread around enough to keep the 
bottleneck condition from manifesting badly, and users did not complain as 
much. We were still seeing dozens of stuck threads, but not hundreds.

After the patch, we went back to 2 tomcats. During the same timeframe today, 
there have been 1 stuck thread on Tomcat A and 6 on Tomcat B.

If the numbers hold, this works out to roughly a 10,000% improvement.


This deserves a bonus, much of which you should share with markt. ;)

-chris


-Original Message-
From: Eric Robinson 
Sent: Friday, May 31, 2024 5:54 AM
To: Tomcat Users List 
Subject: RE: Database Connection Requests Initiated but Not Sent on the Wire
(Some, Not All)

Mark,


-Original Message-
From: Mark Thomas 
Sent: Thursday, May 30, 2024 9:30 AM
To: users@tomcat.apache.org
Subject: Re: Database Connection Requests Initiated but Not Sent on
the Wire (Some, Not All)

OK.

This is an interim binary patch for 9.0.80 only.

The purpose is to:
- confirm the proposed change fixes the problem
- provide you with a workaround in the short term

This is the binary patch:

https://people.apache.org/~markt/dev/classloader-not-found-cache-9.0.8
0-
v1.zip

Extract the contents into $CATALINA_HOME/lib

You should end up with:

$CATALINA_HOME/lib/org/apache/...

Usual caveats apply. This is not an official release. Use it at your
own risk. Don't blame either me or the ASF it is results in alien
invasion, a tax bill, the server catching fire or anything else unexpected

and/or unwanted.


Longer term, I'm not sure this is exactly how I want to fix it in
Tomcat. I am convinced of the need to cache classes that don't exist
but exactly where / how to do that and what degree of control the user should

have is very much TBD.


I suspect this will be a topic of discussion at Community Over Code at
Bratislava next week.

I am expecting that any fix won't be in the June release round but
should be in the July release round.

Let us know how you get on and good luck.



The changes have been applied. We'll know at around 9:30 am EST if they have
had the desired effect. Fingers crossed!



Mark


On 30/05/2024 10:16, Mark Thomas wrote:

On 29/05/2024 17:03, Eric Robinson wrote:




One of the webapps is related to voice reminder messages that go
out to people. The reminders go out sometime after 9 am, which
tracks with the slowdowns.


Ack.

Something to try while I work on a patch is setting
archiveIndexStrategy="bloom" on the resources.

You'd configure that in META-INF/context.xml something like this:


 

Mark


- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


Disclaimer : This email and any files transmitted with it are confidential and
intended solely for intended recipients. If you are not the named addressee you
should not disseminate, distribute, copy or alter this email. Any views or
opinions presented in this email are solely those of the author and might not
represent those of Physician Select Management. Warning: Although Physician
Select Management has taken reasonable precautions to ensure no viruses are
present in this email, the company cannot accept responsibility for any loss or
damage arising from the use of this email or attachments.
B

CB  [  X  ܚX KK[XZ[

  \ \  ][  X  ܚX P X ]
  \X K ܙ B  ܈Y][ۘ[  [X[  K[XZ[

  \ \  Z[ X ]
  \X K ܙ B

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Friday, May 31, 2024 11:45 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 31/05/2024 16:09, Eric Robinson wrote:
> > The results are looking great so far.
>
> Excellent.
>
> > Here's what we know:
> >
> > Before the patch, we had 2 load-balanced tomcats in production for this
> customer. Due to the driver search bottleneck, we were seeing hundreds of
> stuck threads during the slowdown periods. To work around this problem, we
> threw more tomcats at it. With 6 tomcats, the load was spread around enough
> to keep the bottleneck condition from manifesting badly, and users did not
> complain as much. We were still seeing dozens of stuck threads, but not
> hundreds.
> >
> > After the patch, we went back to 2 tomcats.
>
> I appreciate the show of faith! I think that is braver than I would have been 
> but
> it does rather confirm both the problem and the fix.
>

We verified application functionality during the wee hours and examined the 
logs for evidence of any new issues that may have been introduced by the patch. 
I thought about deploying it to just one server but decided such a test would 
be inconclusive, as we could only know if it really works by allowing it to be 
stress-tested. I had a gut feeling it would be okay. 

> > During the same timeframe today, there have been 1 stuck thread on Tomcat
> A and 6 on Tomcat B.
>
> That is great news.
>
> > If the numbers hold, this works out to roughly a 10,000% improvement.
>
> Not bad for free support ;)
>

The best!

> Seriously, I am glad that we seem to have tracked down the root cause and that
> you have a temporary fix that works until such time (probably the July 
> releases)
> that we can figure out how we want to address caching of "not found" classes.
>

We're eager to see how you address it permanently. In the meantime, we're 
delighted because the vendor's recommended "solution" would have been to triple 
the number of tomcat instances, which turns into a maintenance and 
troubleshooting headache. I would bet they have had other cases of slowness 
where the root cause was not isolated. This is a win-win, as it helps both us 
and them.

> Cheers,
>

Cheers back!

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Mark Thomas

On 31/05/2024 16:09, Eric Robinson wrote:

The results are looking great so far.


Excellent.


Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for this 
customer. Due to the driver search bottleneck, we were seeing hundreds of stuck 
threads during the slowdown periods. To work around this problem, we threw more 
tomcats at it. With 6 tomcats, the load was spread around enough to keep the 
bottleneck condition from manifesting badly, and users did not complain as 
much. We were still seeing dozens of stuck threads, but not hundreds.

After the patch, we went back to 2 tomcats.


I appreciate the show of faith! I think that is braver than I would have 
been but it does rather confirm both the problem and the fix.



During the same timeframe today, there have been 1 stuck thread on Tomcat A and 
6 on Tomcat B.


That is great news.


If the numbers hold, this works out to roughly a 10,000% improvement.


Not bad for free support ;)

Seriously, I am glad that we seem to have tracked down the root cause 
and that you have a temporary fix that works until such time (probably 
the July releases) that we can figure out how we want to address caching 
of "not found" classes.


Cheers,

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Eric Robinson
The results are looking great so far.

Here's what we know:

Before the patch, we had 2 load-balanced tomcats in production for this 
customer. Due to the driver search bottleneck, we were seeing hundreds of stuck 
threads during the slowdown periods. To work around this problem, we threw more 
tomcats at it. With 6 tomcats, the load was spread around enough to keep the 
bottleneck condition from manifesting badly, and users did not complain as 
much. We were still seeing dozens of stuck threads, but not hundreds.

After the patch, we went back to 2 tomcats. During the same timeframe today, 
there have been 1 stuck thread on Tomcat A and 6 on Tomcat B.

If the numbers hold, this works out to roughly a 10,000% improvement.


> -Original Message-
> From: Eric Robinson 
> Sent: Friday, May 31, 2024 5:54 AM
> To: Tomcat Users List 
> Subject: RE: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Mark,
>
> > -Original Message-
> > From: Mark Thomas 
> > Sent: Thursday, May 30, 2024 9:30 AM
> > To: users@tomcat.apache.org
> > Subject: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> > OK.
> >
> > This is an interim binary patch for 9.0.80 only.
> >
> > The purpose is to:
> > - confirm the proposed change fixes the problem
> > - provide you with a workaround in the short term
> >
> > This is the binary patch:
> >
> > https://people.apache.org/~markt/dev/classloader-not-found-cache-9.0.8
> > 0-
> > v1.zip
> >
> > Extract the contents into $CATALINA_HOME/lib
> >
> > You should end up with:
> >
> > $CATALINA_HOME/lib/org/apache/...
> >
> > Usual caveats apply. This is not an official release. Use it at your
> > own risk. Don't blame either me or the ASF it is results in alien
> > invasion, a tax bill, the server catching fire or anything else unexpected
> and/or unwanted.
> >
> > Longer term, I'm not sure this is exactly how I want to fix it in
> > Tomcat. I am convinced of the need to cache classes that don't exist
> > but exactly where / how to do that and what degree of control the user 
> > should
> have is very much TBD.
> >
> > I suspect this will be a topic of discussion at Community Over Code at
> > Bratislava next week.
> >
> > I am expecting that any fix won't be in the June release round but
> > should be in the July release round.
> >
> > Let us know how you get on and good luck.
> >
>
> The changes have been applied. We'll know at around 9:30 am EST if they have
> had the desired effect. Fingers crossed!
>
>
> > Mark
> >
> >
> > On 30/05/2024 10:16, Mark Thomas wrote:
> > > On 29/05/2024 17:03, Eric Robinson wrote:
> > >
> > > 
> > >
> > >> One of the webapps is related to voice reminder messages that go
> > >> out to people. The reminders go out sometime after 9 am, which
> > >> tracks with the slowdowns.
> > >
> > > Ack.
> > >
> > > Something to try while I work on a patch is setting
> > > archiveIndexStrategy="bloom" on the resources.
> > >
> > > You'd configure that in META-INF/context.xml something like this:
> > >
> > > 
> > > 
> > >
> > > Mark
> > >
> > > 
> > > - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > > For additional commands, e-mail: users-h...@tomcat.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee 
> you
> should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although Physician
> Select Management has taken reasonable precautions to ensure no viruses are
> present in this email, the company cannot accept responsibility for any loss 
> or
> damage arising from the use of this email or attachments.
> B
> 
> CB  [  X  ܚX KK[XZ[
>
>  \ \  ][  X  ܚX P X ]
>  \X K ܙ B  ܈Y][ۘ[  [X[  K[XZ[
>
>  \ \  Z[ X ]
>  \X K ܙ B
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-31 Thread Eric Robinson
Mark,

> -Original Message-
> From: Mark Thomas 
> Sent: Thursday, May 30, 2024 9:30 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> OK.
>
> This is an interim binary patch for 9.0.80 only.
>
> The purpose is to:
> - confirm the proposed change fixes the problem
> - provide you with a workaround in the short term
>
> This is the binary patch:
>
> https://people.apache.org/~markt/dev/classloader-not-found-cache-9.0.80-
> v1.zip
>
> Extract the contents into $CATALINA_HOME/lib
>
> You should end up with:
>
> $CATALINA_HOME/lib/org/apache/...
>
> Usual caveats apply. This is not an official release. Use it at your own 
> risk. Don't
> blame either me or the ASF it is results in alien invasion, a tax bill, the 
> server
> catching fire or anything else unexpected and/or unwanted.
>
> Longer term, I'm not sure this is exactly how I want to fix it in Tomcat. I am
> convinced of the need to cache classes that don't exist but exactly where / 
> how
> to do that and what degree of control the user should have is very much TBD.
>
> I suspect this will be a topic of discussion at Community Over Code at 
> Bratislava
> next week.
>
> I am expecting that any fix won't be in the June release round but should be 
> in
> the July release round.
>
> Let us know how you get on and good luck.
>

The changes have been applied. We'll know at around 9:30 am EST if they have 
had the desired effect. Fingers crossed!


> Mark
>
>
> On 30/05/2024 10:16, Mark Thomas wrote:
> > On 29/05/2024 17:03, Eric Robinson wrote:
> >
> > 
> >
> >> One of the webapps is related to voice reminder messages that go out
> >> to people. The reminders go out sometime after 9 am, which tracks
> >> with the slowdowns.
> >
> > Ack.
> >
> > Something to try while I work on a patch is setting
> > archiveIndexStrategy="bloom" on the resources.
> >
> > You'd configure that in META-INF/context.xml something like this:
> >
> > 
> > 
> >
> > Mark
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-30 Thread Eric Robinson
Hi Mark,

> -Original Message-
> From: Mark Thomas 
> Sent: Thursday, May 30, 2024 9:30 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> OK.
>
> This is an interim binary patch for 9.0.80 only.
>
> The purpose is to:
> - confirm the proposed change fixes the problem
> - provide you with a workaround in the short term
>
> This is the binary patch:
>
> https://people.apache.org/~markt/dev/classloader-not-found-cache-9.0.80-
> v1.zip
>
> Extract the contents into $CATALINA_HOME/lib
>
> You should end up with:
>
> $CATALINA_HOME/lib/org/apache/...
>

I'll get on this right away.

> Usual caveats apply. This is not an official release. Use it at your own 
> risk. Don't
> blame either me or the ASF it is results in alien invasion, a tax bill, the 
> server
> catching fire or anything else unexpected and/or unwanted.
>

Okay, but if we're invaded by alien tax collectors riding flaming servers, THEN 
I'm coming after you.

> Longer term, I'm not sure this is exactly how I want to fix it in Tomcat. I am
> convinced of the need to cache classes that don't exist but exactly where / 
> how
> to do that and what degree of control the user should have is very much TBD.
>
> I suspect this will be a topic of discussion at Community Over Code at 
> Bratislava
> next week.
>
> I am expecting that any fix won't be in the June release round but should be 
> in
> the July release round.
>
> Let us know how you get on and good luck.
>

Will do!


> Mark
>
>
> On 30/05/2024 10:16, Mark Thomas wrote:
> > On 29/05/2024 17:03, Eric Robinson wrote:
> >
> > 
> >
> >> One of the webapps is related to voice reminder messages that go out
> >> to people. The reminders go out sometime after 9 am, which tracks
> >> with the slowdowns.
> >
> > Ack.
> >
> > Something to try while I work on a patch is setting
> > archiveIndexStrategy="bloom" on the resources.
> >
> > You'd configure that in META-INF/context.xml something like this:
> >
> > 
> > 
> >
> > Mark
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-30 Thread Konstantin Kolinko
ср, 29 мая 2024 г. в 13:34, Mark Thomas :
>
>
>
> It is also problem number 3. The reason it is expensive is that class
> loaders don't cache misses so if a web application has a large number of
> JARs, they all get scanned every time the DriverManager tries to create
> a new connection.
>
>
> The slowness occurs in the web application that depends on the second
> JDBC driver in DriverManager's list. When a request that requires a
> database connection is received, there is a short delay while the web
> application tries, and fails, to load the first JDBC driver in the list.
> Class loading is synchronized on class name being loaded so if any other
> requests also need a database connection, they have to wait for this
> request to finish the search for the JDBC driver before they can
> continue. This creates a bottleneck. Requests are essentially rate
> limited to 1 request that requires a database connection per however
> long it takes to scan every JAR in the web application for a class that
> isn't there. If the average rate of requests exceeds this rate limit
> then a queue is going to build up and it won't subside until the average
> rate of requests falls below this rate limit.
>
> [...]
>
> Problem number 3 is a Tomcat issue. It should be relatively easy to
> start caching misses (i.e. this class loader cannot load this class) and
> save the time spent repeatedly scanning JARs for a class that isn't there.
>

(I wonder if unpacking all JARs into the WEB-INF/classes directory will help.)

> Something to try while I work on a patch is setting
> archiveIndexStrategy="bloom" on the resources.
>
> You'd configure that in META-INF/context.xml something like this:
>
> 
>
> 

+1 for archiveIndexStrategy="bloom".

https://tomcat.apache.org/tomcat-9.0-doc/config/resources.html

The option is available since Tomcat 9.0.69 and thus should be OK here.
(Earlier versions had the feature since 9.0.39, but it was configured
through an attribute of Context
https://bz.apache.org/bugzilla/show_bug.cgi?id=66209

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-30 Thread Mark Thomas

OK.

This is an interim binary patch for 9.0.80 only.

The purpose is to:
- confirm the proposed change fixes the problem
- provide you with a workaround in the short term

This is the binary patch:

https://people.apache.org/~markt/dev/classloader-not-found-cache-9.0.80-v1.zip

Extract the contents into $CATALINA_HOME/lib

You should end up with:

$CATALINA_HOME/lib/org/apache/...

Usual caveats apply. This is not an official release. Use it at your own 
risk. Don't blame either me or the ASF it is results in alien invasion, 
a tax bill, the server catching fire or anything else unexpected and/or 
unwanted.


Longer term, I'm not sure this is exactly how I want to fix it in 
Tomcat. I am convinced of the need to cache classes that don't exist but 
exactly where / how to do that and what degree of control the user 
should have is very much TBD.


I suspect this will be a topic of discussion at Community Over Code at 
Bratislava next week.


I am expecting that any fix won't be in the June release round but 
should be in the July release round.


Let us know how you get on and good luck.

Mark


On 30/05/2024 10:16, Mark Thomas wrote:

On 29/05/2024 17:03, Eric Robinson wrote:



One of the webapps is related to voice reminder messages that go out 
to people. The reminders go out sometime after 9 am, which tracks with 
the slowdowns.


Ack.

Something to try while I work on a patch is setting 
archiveIndexStrategy="bloom" on the resources.


You'd configure that in META-INF/context.xml something like this:


   


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-30 Thread Christopher Schultz

Eric,

On 5/29/24 12:10, Eric Robinson wrote:



-Original Message-
From: Mark Thomas 
Sent: Wednesday, May 29, 2024 10:19 AM
To: users@tomcat.apache.org
Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
(Some, Not All)

On 29/05/2024 16:08, Eric Robinson wrote:


I believe your assessment is correct. How hard is it to enable pooling? Can it

be bolted on, so to speak, through changes to the app context, such that the
webapp itself does not necessarily need to implement special code?

It looks like - from the database configuration you provided earlier - there is 
an
option to configure the database via JNDI. If you do that with Tomcat you will
automatically get pooling. That might be something to follow up with the
vendor. If you go that route, I'd recommend configuring the pool to remove
abandoned connections to avoid any issues with connection leaks.



In reviewing live threads with Visual VM, I note that there are apparently 
threads related to cleaning up abandoned connections, and maybe even pooling?

The threads are:

mysql-cj-abandoned-connection-cleanup (2 of those)


This thread is started by the MySQL driver to clean-up certain resources 
and isn't related to connection pooling. I've had issues with these 
things not shutting down on application-stop in certain versions of 
MySQL's driver. :/



OkHttp Connection Pool (2 of those)
OkHttp https://ps.pndsn.com (not sure what that is)


OkHttp is a network connection pool for HTTP connections. It's not 
related to dB connection pooling.


Most pools do not have any extra threads required, so you wouldn't find 
any evidence of such things running in your system.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-30 Thread Mark Thomas

On 29/05/2024 17:03, Eric Robinson wrote:




One of the webapps is related to voice reminder messages that go out to people. 
The reminders go out sometime after 9 am, which tracks with the slowdowns.


Ack.

Something to try while I work on a patch is setting 
archiveIndexStrategy="bloom" on the resources.


You'd configure that in META-INF/context.xml something like this:


  


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson

> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 10:19 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 16:08, Eric Robinson wrote:
>
> > I believe your assessment is correct. How hard is it to enable pooling? Can 
> > it
> be bolted on, so to speak, through changes to the app context, such that the
> webapp itself does not necessarily need to implement special code?
>
> It looks like - from the database configuration you provided earlier - there 
> is an
> option to configure the database via JNDI. If you do that with Tomcat you will
> automatically get pooling. That might be something to follow up with the
> vendor. If you go that route, I'd recommend configuring the pool to remove
> abandoned connections to avoid any issues with connection leaks.
>

In reviewing live threads with Visual VM, I note that there are apparently 
threads related to cleaning up abandoned connections, and maybe even pooling?

The threads are:

mysql-cj-abandoned-connection-cleanup (2 of those)
OkHttp Connection Pool (2 of those)
OkHttp https://ps.pndsn.com (not sure what that is)


> Not sure if all the web applications support a JNDI based configuration.
>
> 
>
> > Would the problem be relieved if the vendor stuck to one driver?
>
> Yes. That would avoid the attempt to load the "other" driver which is causing
> the delay.
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Hi Mark,


> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 10:10 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 13:38, Eric Robinson wrote:
> >> -Original Message-
> >> From: Mark Thomas 
>
> 
>
> >> I intend to wok on a patch for Tomcat that will add caching that
> >> should speed things up considerably. I hope to have something for
> >> Eric to test today but it might take me until tomorrow as I have a
> >> few other time critical things fighting to get tot he top of my TODO list 
> >> at the
> moment.
> >>
> >>
> >> Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib
> >> may also be a short-term fix but is likely to create problems if the
> >> same JAR ever exists in both locations at the same time.
>
> Just an FYI. On further reflection, moving the JDBC driver JARs isn't going to
> help. Sorry. You'll need my fix.
>
> Assuming, of course, you are willing to test a patch to address this on a
> production system.
>

Absolutely. We and the users are ready to do what it takes.

> > That's some great sleuthing and the explanation makes a ton of sense. It
> leaves me with a couple of questions.
> >
> > If you are correct, then it follows that historic activity has been hovering
> dangerously near the threshold where this symptom would manifest. Within the
> past month, an unknown change in the system climate now causes an uptick in
> the number of DB requests/second at roughly the same time daily (with
> occasional exceptions) and the system begins to trip over its own feet. I 
> haven't
> seen anything in my Zabbix graphs that stood out as potentially problematic.
> Armed with this information, I am now taking a closer look.
>
> Ack.
>
> > The natural next question is, what changed in the application or the users'
> workflow to push activity over the threshold? We'll dig into that.
>
> Could be all sorts of things.
>
> It might just have been coincidence the first time and now the users all 
> request
> the data they need at the start of their day in case the problem happens 
> again.
> And by doing that they cause the very problem they are trying to avoid.
>

One of the webapps is related to voice reminder messages that go out to people. 
The reminders go out sometime after 9 am, which tracks with the slowdowns.

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Mark Thomas

On 29/05/2024 16:08, Eric Robinson wrote:


I believe your assessment is correct. How hard is it to enable pooling? Can it 
be bolted on, so to speak, through changes to the app context, such that the 
webapp itself does not necessarily need to implement special code?


It looks like - from the database configuration you provided earlier - 
there is an option to configure the database via JNDI. If you do that 
with Tomcat you will automatically get pooling. That might be something 
to follow up with the vendor. If you go that route, I'd recommend 
configuring the pool to remove abandoned connections to avoid any issues 
with connection leaks.


Not sure if all the web applications support a JNDI based configuration.




Would the problem be relieved if the vendor stuck to one driver?


Yes. That would avoid the attempt to load the "other" driver which is 
causing the delay.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Mark Thomas

On 29/05/2024 13:38, Eric Robinson wrote:

-Original Message-
From: Mark Thomas 





I intend to wok on a patch for Tomcat that will add caching that should
speed things up considerably. I hope to have something for Eric to test
today but it might take me until tomorrow as I have a few other time
critical things fighting to get tot he top of my TODO list at the moment.


Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib may
also be a short-term fix but is likely to create problems if the same
JAR ever exists in both locations at the same time.


Just an FYI. On further reflection, moving the JDBC driver JARs isn't 
going to help. Sorry. You'll need my fix.


Assuming, of course, you are willing to test a patch to address this on 
a production system.



That's some great sleuthing and the explanation makes a ton of sense. It leaves 
me with a couple of questions.

If you are correct, then it follows that historic activity has been hovering 
dangerously near the threshold where this symptom would manifest. Within the 
past month, an unknown change in the system climate now causes an uptick in the 
number of DB requests/second at roughly the same time daily (with occasional 
exceptions) and the system begins to trip over its own feet. I haven't seen 
anything in my Zabbix graphs that stood out as potentially problematic. Armed 
with this information, I am now taking a closer look.


Ack.


The natural next question is, what changed in the application or the users' 
workflow to push activity over the threshold? We'll dig into that.


Could be all sorts of things.

It might just have been coincidence the first time and now the users all 
request the data they need at the start of their day in case the problem 
happens again. And by doing that they cause the very problem they are 
trying to avoid.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Mark,

A few other thoughts come to mind. See below.

> -Original Message-
> From: Eric Robinson 
> Sent: Wednesday, May 29, 2024 7:39 AM
> To: Tomcat Users List 
> Subject: RE: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Mark,
>
>
> > -Original Message-
> > From: Mark Thomas 
> > Sent: Wednesday, May 29, 2024 5:35 AM
> > To: users@tomcat.apache.org
> > Subject: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> > On 29/05/2024 10:26, Mark Thomas wrote:
> > > On 28/05/2024 16:26, Eric Robinson wrote:
> > >
> > > 
> > >
> > >> Took a bunch of thread and heap dumps during today's painful debacle.
> > >> Will send a link to those as soon as I can.
> > >
> > > Thanks. I have them. I have taken a look and I am starting to form a
> > > theory. To help with that I have a couple of questions.
> >
> > Scratch that. I've found some further information in the data Eric
> > sent me off- list and I am now pretty sure what is going on.
> >
> > There are multiple web applications deployed on the servers. I assume
> > there are related but it actually doesn't matter.
> >
> > At least one application is using the "new" MySQL JDBC driver:
> > com.mysql.cj.jdbc.Driver
> >
> > At least one application is using the "old" MySQL JDBC driver:
> > com.mysql.jdbc.Driver
> >
> >
> > (I've told Eric off-list which application is using which).
> >
> > There are, therefore, two drivers registered with the
> > java.sql.DriverManager
> >
> >
> > The web applications are not using connection pooling. Or, if they are
> > using it, they are using it very inefficiently. The result is that
> > there is a high volume of calls to create new database connections.
> >
> > This is problem number 1. Creating a database connection is expensive.
> > That is why the concept of database connection pooling was created.
> >
> >
> > When a new connection is created, java.sql.DriverManager iterates over
> > the list of registered drivers and
> > - tests to see if the current class loader can see the driver
> > - if yes, tests to see if that driver can service the connection url
> > - if yes, use it and exit
> > - go on to the next driver in the list and repeat
> >
> > The test to see if the current class loader can use the driver is,
> > essentially, to call Class.forName(driver.getClass(), true,
> > classloader)
> >
> > And that is problem number 2. That check is expensive if the current
> > class loader can't load that driver.
> >
> >
> > It is also problem number 3. The reason it is expensive is that class
> > loaders don't cache misses so if a web application has a large number
> > of JARs, they all get scanned every time the DriverManager tries to
> > create a new connection.
> >

Maybe a potential solution is to have the class loader cache misses? Wait, I 
see you answered that further down...

> >
> > The slowness occurs in the web application that depends on the second
> > JDBC driver in DriverManager's list. When a request that requires a
> > database connection is received, there is a short delay while the web
> > application tries, and fails, to load the first JDBC driver in the list.
> > Class loading is synchronized on class name being loaded so if any
> > other requests also need a database connection, they have to wait for
> > this request to finish the search for the JDBC driver before they can
> > continue. This creates a bottleneck. Requests are essentially rate
> > limited to 1 request that requires a database connection per however
> > long it takes to scan every JAR in the web application for a class
> > that isn't there. If the average rate of requests exceeds this rate
> > limit then a queue is going to build up and it won't subside until the
> > average rate of requests falls below this rate limit.
> >
> >
> >
> > Problem number 1 is an application issue. It should be using pooling.
> > It seems unlikely that we'll see a solution from the application
> > vendor and
> > - even if the vendor does commit to a fix - I suspect it will take months.
> >

I believe your assessment is correct. How hard is it to enable pooling? Can it 
be bolted on, so to speak, through changes to the app context, such that the 
webapp itself does not necessarily need to implement special code?

> >
> > Problem number 2 is a JRE issue. I think there are potentially more
> > efficient ways to perform that check but that needs research as things
> > like OSGI and JPMS make class loading more complicated.
> >
> >
> > Problem number 3 is a Tomcat issue. It should be relatively easy to
> > start caching misses (i.e. this class loader cannot load this class)
> > and save the time spent repeatedly scanning JARs for a class that isn't 
> > there.
> >
> >
> > I intend to wok on a patch for Tomcat that will add caching that
> > should speed things up considerably. I hope to have something for Eric
> > to test today but it might take me until tomorrow as I have a few

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Hi Mark,


> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 5:35 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 10:26, Mark Thomas wrote:
> > On 28/05/2024 16:26, Eric Robinson wrote:
> >
> > 
> >
> >> Took a bunch of thread and heap dumps during today's painful debacle.
> >> Will send a link to those as soon as I can.
> >
> > Thanks. I have them. I have taken a look and I am starting to form a
> > theory. To help with that I have a couple of questions.
>
> Scratch that. I've found some further information in the data Eric sent me 
> off-
> list and I am now pretty sure what is going on.
>
> There are multiple web applications deployed on the servers. I assume there 
> are
> related but it actually doesn't matter.
>
> At least one application is using the "new" MySQL JDBC driver:
> com.mysql.cj.jdbc.Driver
>
> At least one application is using the "old" MySQL JDBC driver:
> com.mysql.jdbc.Driver
>
>
> (I've told Eric off-list which application is using which).
>
> There are, therefore, two drivers registered with the java.sql.DriverManager
>
>
> The web applications are not using connection pooling. Or, if they are using 
> it,
> they are using it very inefficiently. The result is that there is a high 
> volume of
> calls to create new database connections.
>
> This is problem number 1. Creating a database connection is expensive.
> That is why the concept of database connection pooling was created.
>
>
> When a new connection is created, java.sql.DriverManager iterates over the 
> list
> of registered drivers and
> - tests to see if the current class loader can see the driver
> - if yes, tests to see if that driver can service the connection url
> - if yes, use it and exit
> - go on to the next driver in the list and repeat
>
> The test to see if the current class loader can use the driver is, 
> essentially, to
> call Class.forName(driver.getClass(), true, classloader)
>
> And that is problem number 2. That check is expensive if the current class
> loader can't load that driver.
>
>
> It is also problem number 3. The reason it is expensive is that class
> loaders don't cache misses so if a web application has a large number of
> JARs, they all get scanned every time the DriverManager tries to create
> a new connection.
>
>
> The slowness occurs in the web application that depends on the second
> JDBC driver in DriverManager's list. When a request that requires a
> database connection is received, there is a short delay while the web
> application tries, and fails, to load the first JDBC driver in the list.
> Class loading is synchronized on class name being loaded so if any other
> requests also need a database connection, they have to wait for this
> request to finish the search for the JDBC driver before they can
> continue. This creates a bottleneck. Requests are essentially rate
> limited to 1 request that requires a database connection per however
> long it takes to scan every JAR in the web application for a class that
> isn't there. If the average rate of requests exceeds this rate limit
> then a queue is going to build up and it won't subside until the average
> rate of requests falls below this rate limit.
>
>
>
> Problem number 1 is an application issue. It should be using pooling. It
> seems unlikely that we'll see a solution from the application vendor and
> - even if the vendor does commit to a fix - I suspect it will take months.
>
>
> Problem number 2 is a JRE issue. I think there are potentially more
> efficient ways to perform that check but that needs research as things
> like OSGI and JPMS make class loading more complicated.
>
>
> Problem number 3 is a Tomcat issue. It should be relatively easy to
> start caching misses (i.e. this class loader cannot load this class) and
> save the time spent repeatedly scanning JARs for a class that isn't there.
>
>
> I intend to wok on a patch for Tomcat that will add caching that should
> speed things up considerably. I hope to have something for Eric to test
> today but it might take me until tomorrow as I have a few other time
> critical things fighting to get tot he top of my TODO list at the moment.
>
>
> Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib may
> also be a short-term fix but is likely to create problems if the same
> JAR ever exists in both locations at the same time.
>
>
> Mark
>

That's some great sleuthing and the explanation makes a ton of sense. It leaves 
me with a couple of questions.

If you are correct, then it follows that historic activity has been hovering 
dangerously near the threshold where this symptom would manifest. Within the 
past month, an unknown change in the system climate now causes an uptick in the 
number of DB requests/second at roughly the same time daily (with occasional 
exceptions) and the system begins to trip over its own feet. I haven't seen 

Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Mark Thomas

On 29/05/2024 10:26, Mark Thomas wrote:

On 28/05/2024 16:26, Eric Robinson wrote:



Took a bunch of thread and heap dumps during today's painful debacle. 
Will send a link to those as soon as I can.


Thanks. I have them. I have taken a look and I am starting to form a 
theory. To help with that I have a couple of questions.


Scratch that. I've found some further information in the data Eric sent 
me off-list and I am now pretty sure what is going on.


There are multiple web applications deployed on the servers. I assume 
there are related but it actually doesn't matter.


At least one application is using the "new" MySQL JDBC driver:
com.mysql.cj.jdbc.Driver

At least one application is using the "old" MySQL JDBC driver:
com.mysql.jdbc.Driver


(I've told Eric off-list which application is using which).

There are, therefore, two drivers registered with the java.sql.DriverManager


The web applications are not using connection pooling. Or, if they are 
using it, they are using it very inefficiently. The result is that there 
is a high volume of calls to create new database connections.


This is problem number 1. Creating a database connection is expensive. 
That is why the concept of database connection pooling was created.



When a new connection is created, java.sql.DriverManager iterates over 
the list of registered drivers and

- tests to see if the current class loader can see the driver
- if yes, tests to see if that driver can service the connection url
- if yes, use it and exit
- go on to the next driver in the list and repeat

The test to see if the current class loader can use the driver is, 
essentially, to call Class.forName(driver.getClass(), true, classloader)


And that is problem number 2. That check is expensive if the current 
class loader can't load that driver.



It is also problem number 3. The reason it is expensive is that class 
loaders don't cache misses so if a web application has a large number of 
JARs, they all get scanned every time the DriverManager tries to create 
a new connection.



The slowness occurs in the web application that depends on the second 
JDBC driver in DriverManager's list. When a request that requires a 
database connection is received, there is a short delay while the web 
application tries, and fails, to load the first JDBC driver in the list. 
Class loading is synchronized on class name being loaded so if any other 
requests also need a database connection, they have to wait for this 
request to finish the search for the JDBC driver before they can 
continue. This creates a bottleneck. Requests are essentially rate 
limited to 1 request that requires a database connection per however 
long it takes to scan every JAR in the web application for a class that 
isn't there. If the average rate of requests exceeds this rate limit 
then a queue is going to build up and it won't subside until the average 
rate of requests falls below this rate limit.




Problem number 1 is an application issue. It should be using pooling. It 
seems unlikely that we'll see a solution from the application vendor and 
- even if the vendor does commit to a fix - I suspect it will take months.



Problem number 2 is a JRE issue. I think there are potentially more 
efficient ways to perform that check but that needs research as things 
like OSGI and JPMS make class loading more complicated.



Problem number 3 is a Tomcat issue. It should be relatively easy to 
start caching misses (i.e. this class loader cannot load this class) and 
save the time spent repeatedly scanning JARs for a class that isn't there.



I intend to wok on a patch for Tomcat that will add caching that should 
speed things up considerably. I hope to have something for Eric to test 
today but it might take me until tomorrow as I have a few other time 
critical things fighting to get tot he top of my TODO list at the moment.



Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib may 
also be a short-term fix but is likely to create problems if the same 
JAR ever exists in both locations at the same time.



Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Mark Thomas

On 28/05/2024 16:26, Eric Robinson wrote:




Took a bunch of thread and heap dumps during today's painful debacle. Will send 
a link to those as soon as I can.


Thanks. I have them. I have taken a look and I am starting to form a 
theory. To help with that I have a couple of questions.


1. Could you tell me where the JDBC driver JAR is located. Is it in 
WEB-INF/lib for the web application(s) or is it in $CATALINA_BASE/lib ?


2. How big is WEB-INF/lib for the web application(s)? How many JAR files 
and what is the total size on disk of that directory?


3. Would you be prepared to run Tomcat in production with a binary patch 
(against 9.0.80). This would involve placing one or more class files in 
the right directory structure under $CATALINA_BASE/lib either to collect 
additional debug logging or to test a potential fix.




Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Eric Robinson
Hi Mark,

See comments below.


> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, May 28, 2024 9:32 AM
> To: Tomcat Users List 
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Eric,
>
> Follow-up observsations and comments in-line.
>
> >> What time does this problem start?
> >
> > It typically starts around 9:15 am EDT and goes until around 10:30 am.
>
> Does that match the time of highest request load from the customer?
> Rather than a spike, I'm wondering if the problem is triggered once load
> exceeds some threshold.
>

My nginx proxy console only shows live activity and does not keep a history, 
but I can probably script something to parse the localhost_access logs and 
graph request counts on a per-minute basis. Will work on that.

> > We finished and implemented the script yesterday, so today will be the first
> day that it produces results. It watches the catalina.out file for stuck 
> thread
> detection warnings. When the number of stuck threads exceeds a threshold,
> then it starts doing thread dumps every 60 seconds until the counts drops back
> down below the threshold. The users typically do not complain of slowness 
> until
> the stuck thread count exceeds 20, and during that time the threads often take
> up to a minute or more to complete. It's too late today to change the timings,
> but if it does not produce any actionable intel, we can adjust them tonight.
>
> Lets see what that produces and go from there.
>

Took a bunch of thread and heap dumps during today's painful debacle. Will send 
a link to those as soon as I can.

> > The vendor claims that the feature uses a different server and does not send
> requests to the slow ones, so it has been re-enabled at the customer's 
> request.
> We may ask them to disable it again until we get this issue resolved.
>
> Noted.
>
> > This customer sends about 1.5 million requests to each load-balanced
> > server during a typical production day. Most other customers send much
> > less, often only a fraction of that. However, at least one customer
> > sends about 2 million to the same server, and they don't see the
> > problem. (I will check if they have the AI feature enabled.)
>
> Hmm. Whether that other customer has the AI feature enabled would be an
> interesting data point.

I will ask them right after I send this message. They are usually a little slow 
to respond.

>
> >> Can we see the full stack trace please.
> >
> > Here's one example.
>
> 
>
> >  java.lang.Throwable
> >  at
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoa
> derBase.java:1252)
> >  at
> > org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClass
> > LoaderBase.java:1220)
>
> 
>
> That is *very* interesting. That is the start of a synch block in the class 
> loader. It
> should complete quickly. The full thread dump should tell us what is holding 
> the
> lock. If we are lucky we'll be able to tell why the lock is being held for so 
> long.
>
> We might need to reduce the time between thread dumps to figure out what
> the thread that is blocking everything is doing. We'll see.
>
> > The app has DB connection details in two places. First, it uses a database
> connection string in a .properties file, as follows. This string handles most
> connections to the DB.
> >
> > mobiledoc.DBUrl=jdbc:mysql://ha52a:5791
> >
> mobiledoc.DBName=mobiledoc_791?useSSL=false=rou
> nd
> >
> =false=true
> > esOnException=true=false=tru
> > e=true
> > mobiledoc.DBUser=
> > mobiledoc.DBPassword=
>
> OK. That seems unlikely to be using connection pooling although the 
> application
> might be pooling internally.
>

Based on lots of previous observation, I don't think they are. The comms 
between the app and DB are choppy, with only about 1-5 queries per TCP 
connection. If they are pooling, they are not doing it aggressively.

> > It also has second DB config specifically for a drug database.
> >
> > 
> >
> >
> >  
> >  
> >  
> >  c:\out.log
> >
> >
> >
> >
> >
> >  
> >
> INSERT_CONTEXT_FACTORY FACTORY>
> >  INSERT_JNDI_URL
> >  INSERT_USER_NAME
> >  INSERT_PASSWORD
> >  INSERT_LOOKUP_NAME
> >  com.mysql.jdbc.Driver
> >
> jdbc:mysql://dbclust54:5791/medispan?sessionVariables=wait_timeout=2
> 8800,interactive_timeout=28800
> >  redacted
> >  redacted
> >  10
> >  5000
> >
> >
> >
> >
> >  true
> >  0
> >  1800
> >
> > 
>
> Hmm. There is a pool size setting there but we can't tell if it is being used.
>
> >> Is that Tomcat 9.0.80 as provided by the ASF?
>
> An explicit answer to this question would be helpful.
>

Didn't mean to seem evasive. Yes, it's from the ASF.


> In terms of the way forward, we need to see to thread dumps when the problem
> is happening to figure out where the blockage is happening and
> (hopefully) why.
>
> Mark

Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Mark Thomas

Hi Eric,

Follow-up observsations and comments in-line.


What time does this problem start?


It typically starts around 9:15 am EDT and goes until around 10:30 am.


Does that match the time of highest request load from the customer? 
Rather than a spike, I'm wondering if the problem is triggered once load 
exceeds some threshold.



We finished and implemented the script yesterday, so today will be the first 
day that it produces results. It watches the catalina.out file for stuck thread 
detection warnings. When the number of stuck threads exceeds a threshold, then 
it starts doing thread dumps every 60 seconds until the counts drops back down 
below the threshold. The users typically do not complain of slowness until the 
stuck thread count exceeds 20, and during that time the threads often take up 
to a minute or more to complete. It's too late today to change the timings, but 
if it does not produce any actionable intel, we can adjust them tonight.


Lets see what that produces and go from there.


The vendor claims that the feature uses a different server and does not send 
requests to the slow ones, so it has been re-enabled at the customer's request. 
We may ask them to disable it again until we get this issue resolved.


Noted.


This customer sends about 1.5 million requests to each load-balanced server 
during a typical production day. Most other customers send much less, often 
only a fraction of that. However, at least one customer sends about 2 million 
to the same server, and they don't see the problem. (I will check if they have 
the AI feature enabled.)


Hmm. Whether that other customer has the AI feature enabled would be an 
interesting data point.



Can we see the full stack trace please.


Here's one example.





 java.lang.Throwable
 at 
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1252)
 at 
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1220)




That is *very* interesting. That is the start of a synch block in the 
class loader. It should complete quickly. The full thread dump should 
tell us what is holding the lock. If we are lucky we'll be able to tell 
why the lock is being held for so long.


We might need to reduce the time between thread dumps to figure out what 
the thread that is blocking everything is doing. We'll see.



The app has DB connection details in two places. First, it uses a database 
connection string in a .properties file, as follows. This string handles most 
connections to the DB.

mobiledoc.DBUrl=jdbc:mysql://ha52a:5791
mobiledoc.DBName=mobiledoc_791?useSSL=false=round=false=true=true=false=true=true
mobiledoc.DBUser=
mobiledoc.DBPassword=


OK. That seems unlikely to be using connection pooling although the 
application might be pooling internally.



It also has second DB config specifically for a drug database.


   
   
 
 
 
 c:\out.log
   
   
   
   
   
 
 INSERT_CONTEXT_FACTORY
 INSERT_JNDI_URL
 INSERT_USER_NAME
 INSERT_PASSWORD
 INSERT_LOOKUP_NAME
 com.mysql.jdbc.Driver
 
jdbc:mysql://dbclust54:5791/medispan?sessionVariables=wait_timeout=28800,interactive_timeout=28800
 redacted
 redacted
 10
 5000
   
   
   
   
 true
 0
 1800
   



Hmm. There is a pool size setting there but we can't tell if it is being 
used.



Is that Tomcat 9.0.80 as provided by the ASF?


An explicit answer to this question would be helpful.

In terms of the way forward, we need to see to thread dumps when the 
problem is happening to figure out where the blockage is happening and 
(hopefully) why.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Eric Robinson
Hi Mark,

> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, May 28, 2024 3:42 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Eric,
>
> I have a some follow-up questions in-line. I have also read the other 
> messages in
> this thread and added a couple of additional questions based on what I read in
> those threads.
>
>
> On 26/05/2024 02:58, Eric Robinson wrote:
> > One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day.
>
> What time does this problem start?
>

It typically starts around 9:15 am EDT and goes until around 10:30 am.

> Does it occur every day of the week including weekends?
>

Most weekdays. There have been 1 or 2 weekdays when it seems that symptom 
inexplicably did not appear. I'm not sure about weekends, as the medical 
practice does not work on those days.

> How does the slowness correlate to:
> - request volume
> - requests to any particular URL(s)?
> - requests from any particular client IP?
> - any other attribute of the request?
>

> (I'm trying to see if there is something about the requests that triggers the
> issue.)
>

We have not seen anything stand out. There are no apparent spikes in request 
volume. The slowness appears to impact all parts of the system (meaning all 
URLs). It manifests for the customer, but we have also seen it when we connect 
to the app internally, behind the firewall and reverse proxy, directly to the 
tomcat server from a workstation connected to the same switch.

> > During the slow times, we've done all the usual troubleshooting to catch the
> problem in the act. The servers have plenty of power and are not overworked.
> There are no slow database queries. Network connectivity is solid. Tomcat has
> plenty of memory. The numbers of database connections, threads, questions,
> queries, etc., remain steady, without spikes. There is no unusual disk 
> latency.
> We have not found any maintenance tasks running during that timeframe.
>
> I would usually suggest taking three thread dumps approximately 5s apart and
> then diffing them to try and spot "slow moving" threads.
>

> I see you have scripted trigger a thread dump when the slowness hits. If you
> haven't already, please configure it to capture (at least) 3 dumps
> ~5 seconds apart.
>
> (If we can spot the slow moving threads we might be able to identify what it 
> is
> that makes them slow moving.)
>

We finished and implemented the script yesterday, so today will be the first 
day that it produces results. It watches the catalina.out file for stuck thread 
detection warnings. When the number of stuck threads exceeds a threshold, then 
it starts doing thread dumps every 60 seconds until the counts drops back down 
below the threshold. The users typically do not complain of slowness until the 
stuck thread count exceeds 20, and during that time the threads often take up 
to a minute or more to complete. It's too late today to change the timings, but 
if it does not produce any actionable intel, we can adjust them tonight.

> > The customer has another load-balanced tomcat instance on a different
> physical server, and the problem happens on that one, too. The servers were
> upgraded with a new kernel and packages on 4/5/24, but the issue did not
> appear until 5/6/24. The vendor enabled a new feature in the customer's
> software, and the problem appeared the next day, but they subsequently
> disabled the feature, and (reportedly) the problem did not go away.
>
> Have you confirmed that the feature really is disabled? Or was it just hidden?
>

The vendor claims that the feature uses a different server and does not send 
requests to the slow ones, so it has been re-enabled at the customer's request. 
We may ask them to disable it again until we get this issue resolved.

> Has this feature been enabled for any other customers? If yes, have they
> experienced similar issues?
>

> (It is suspicious that the issue occurred after the feature was disabled. I 
> wonder
> if some elements of that change (e.g. a database
> change) are still in place and causing issues.)
>

We agree that it is suspicious, but at this point we are forced to give it the 
side-eye. We're not aware of other customers being impacted, but (a) it's a new 
AI-based feature, so not many other customers have it, (b) it is enabled by the 
vendor directly, so we are not in the notification loop, and (c) the problem 
customer is large, with about 800 staff, whereas most other customers are much 
and might not trigger the symptom. Bottom line, we're not *sure*, but we think 
the feature is unrelated, but we'll ask them to disable it anyway.

> > It is worth mentioning that the servers are multi-tenanted, with other
> customers running the same medical 

Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Mark Thomas

Hi Eric,

I have a some follow-up questions in-line. I have also read the other 
messages in this thread and added a couple of additional questions based 
on what I read in those threads.



On 26/05/2024 02:58, Eric Robinson wrote:

One of our hosting customers is a medical practice using a commercial EMR 
running on tomcat+mysql. It has operated well for over a year, but users have 
suddenly begun experiencing slowness for about an hour at the same time every 
day.


What time does this problem start?

Does it occur every day of the week including weekends?

How does the slowness correlate to:
- request volume
- requests to any particular URL(s)?
- requests from any particular client IP?
- any other attribute of the request?

(I'm trying to see if there is something about the requests that 
triggers the issue.)



During the slow times, we've done all the usual troubleshooting to catch the 
problem in the act. The servers have plenty of power and are not overworked. 
There are no slow database queries. Network connectivity is solid. Tomcat has 
plenty of memory. The numbers of database connections, threads, questions, 
queries, etc., remain steady, without spikes. There is no unusual disk latency. 
We have not found any maintenance tasks running during that timeframe.


I would usually suggest taking three thread dumps approximately 5s apart 
and then diffing them to try and spot "slow moving" threads.


I see you have scripted trigger a thread dump when the slowness hits. If 
you haven't already, please configure it to capture (at least) 3 dumps 
~5 seconds apart.


(If we can spot the slow moving threads we might be able to identify 
what it is that makes them slow moving.)



The customer has another load-balanced tomcat instance on a different physical 
server, and the problem happens on that one, too. The servers were upgraded 
with a new kernel and packages on 4/5/24, but the issue did not appear until 
5/6/24. The vendor enabled a new feature in the customer's software, and the 
problem appeared the next day, but they subsequently disabled the feature, and 
(reportedly) the problem did not go away.


Have you confirmed that the feature really is disabled? Or was it just 
hidden?


Has this feature been enabled for any other customers? If yes, have they 
experienced similar issues?


(It is suspicious that the issue occurred after the feature was 
disabled. I wonder if some elements of that change (e.g. a database 
change) are still in place and causing issues.)



It is worth mentioning that the servers are multi-tenanted, with other 
customers running the same medical application, but the others do not 
experience the slowdowns, even though they are on the same servers.


How does this customer compare, in terms of volume of requests, to other 
customers that are not experiencing this issue.


Is there anything unique or special about the customer experiencing the 
issue? Do they have some custom settings no-one else uses?


(I am trying to figure out if the issue is load related, customer 
specific or something else).



There are no unusual errors in the tomcat or database server logs, EXCEPT this 
one: Java.sql.DriverManager.getConnection


Can we see the full stack trace please.


During the periods of slowness, we see lots of those errors along with a large 
spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It 
seems obvious that the threads are stuck because tomcat is waiting on a 
connection to the database. However, tcpdump shows that connectivity to the 
database is perfect at the network and application layers. There are no 
unanswered SYNs, no retransmissions, no half-open connections, no failures to 
allocate TCP ports, no conntrack messages, and no other indications of system 
resource exhaustion. Every time tomcat requests a connection to the DB, it 
completes in less than 1 ms. Ten thousand connection attempts completed 
successfully in about 15 seconds, with zero failures.


It sounds like things might be getting stuck somewhere in or near the 
JDBC driver.


Can you provide the exact version of the JDBC driver you are using?

Can you provide the full database configuration from context.xml (or 
wherever it is configured). Please redact sensitive information such as 
passwords.



We are forced to conclude that some database connection requests are being 
initiated but are not being sent on the wire. The problem seems to be in the 
interaction between tomcat and the database driver, or in the driver itself.


I agree.


Unfortunately, the application vendor is taking the "it's your infrastructure" 
position without providing any evidence or offering suggestions for configuration changes,


I'm sorry to hear that. We'll do what we can to help.


other than to deploy more tomcat instances, which is just shooting in the dark. 
They don't know why the software is throwing 
java.sql.DriverManager.getConnection errors (even though it's their code), and 

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Chuck,

> -Original Message-
> From: Chuck Caldarale 
> Sent: Sunday, May 26, 2024 2:21 PM
> To: Tomcat Users List 
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
>
> > On May 25, 2024, at 20:58, Eric Robinson  wrote:
> >
> > One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day. During the slow times, we've done all the usual troubleshooting to
> catch the problem in the act. The servers have plenty of power and are not
> overworked. There are no slow database queries. Network connectivity is solid.
> Tomcat has plenty of memory. The numbers of database connections, threads,
> questions, queries, etc., remain steady, without spikes. There is no unusual 
> disk
> latency. We have not found any maintenance tasks running during that
> timeframe.
>
>
> 
>
>
> > There are no unusual errors in the tomcat or database server logs, EXCEPT
> this one: Java.sql.DriverManager.getConnection
>
>
> 
>
>
> > During the periods of slowness, we see lots of those errors along with a 
> > large
> spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). 
> It
> seems obvious that the threads are stuck because tomcat is waiting on a
> connection to the database.
>
>
> 
>
>
> > We are forced to conclude that some database connection requests are being
> initiated but are not being sent on the wire.
>
>
> Could the DB server be out of ports? (Seems unlikely, based on your debugging
> so far.)
>

We have not seen any indication of that.

> Any chance that the Tomcat process is running out of file descriptors? Or 
> ports?
>

Likewise, no indications of that.

> Can you force a garbage collection (e.g., with jconsole or similar tool) 
> during a
> slow period? If there is some limit on an OS-level resource that’s being 
> reached,
> a GC may be able to delete the Java objects that are tying up the underlying
> resources.

GC is on my list of things to try.

>
>   - Chuck
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Thomas,


> -Original Message-
> From: Thomas Hoffmann (Speed4Trade GmbH)
> 
> Sent: Sunday, May 26, 2024 3:30 PM
> To: Tomcat Users List 
> Subject: AW: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hello,
>
> > -Ursprüngliche Nachricht-
> > Von: Chuck Caldarale 
> > Gesendet: Sonntag, 26. Mai 2024 21:21
> > An: Tomcat Users List 
> > Betreff: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> >
> > > On May 25, 2024, at 20:58, Eric Robinson 
> wrote:
> > >
> > > One of our hosting customers is a medical practice using a
> > > commercial EMR
> > running on tomcat+mysql. It has operated well for over a year, but
> > users have suddenly begun experiencing slowness for about an hour at
> > the same time every day. During the slow times, we've done all the
> > usual troubleshooting to catch the problem in the act. The servers
> > have plenty of power and are not overworked. There are no slow database
> queries. Network connectivity is solid.
> > Tomcat has plenty of memory. The numbers of database connections,
> > threads, questions, queries, etc., remain steady, without spikes.
> > There is no unusual disk latency. We have not found any maintenance
> > tasks running during that timeframe.
> >
> >
> > 
> >
> >
> > > There are no unusual errors in the tomcat or database server logs,
> > > EXCEPT
> > this one: Java.sql.DriverManager.getConnection
> >
> >
> > 
> >
> >
> > > During the periods of slowness, we see lots of those errors along
> > > with a large
> > spike in the number of stuck tomcat threads (from 1 or 2 to as high as
> > 100). It seems obvious that the threads are stuck because tomcat is
> > waiting on a connection to the database.
> >
> >
> > 
> >
> >
> > > We are forced to conclude that some database connection requests are
> > > being
> > initiated but are not being sent on the wire.
> >
> >
> > Could the DB server be out of ports? (Seems unlikely, based on your
> > debugging so far.)
> >
> > Any chance that the Tomcat process is running out of file descriptors? Or
> ports?
> >
> > Can you force a garbage collection (e.g., with jconsole or similar
> > tool) during a slow period? If there is some limit on an OS-level
> > resource that’s being reached, a GC may be able to delete the Java
> > objects that are tying up the underlying resources.
> >
> >   - Chuck
> >
>
>
> On the client side, the TCP connections are kept in a wait-state for usually 2
> minutes as far as I know.
> Maybe you can check how many are in this state.
>

On our server, we set things much lower to allow faster recycling of TCP 
connections...

net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1

> If the application doesn’t use connection pooling, then this can be the 
> problem
> itself too.

During peak production, there are a total of around 20,000 connections in 
various states, mostly TIME_WAIT. The port range is 5000-6.
Dmesg, journalcrl, and the messages file don't show any errors about running 
out of ports or file handles.


> TCP handshakes and logon process take a while and for performance reasons,
> DB connections are usually pooled.
>
> A stacktrace might help to see what java is doing when it enters this blocking
> state.
> Maybe you can provide a stack when the app starts blocking.
>

We are writing a script to watch for stuck threads to exceed a threshold, and 
do a thread dump when that happens.

> Greetings,
> Thomas
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


AW: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Thomas Hoffmann (Speed4Trade GmbH)
Hello,

> -Ursprüngliche Nachricht-
> Von: Chuck Caldarale 
> Gesendet: Sonntag, 26. Mai 2024 21:21
> An: Tomcat Users List 
> Betreff: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
> 
> 
> > On May 25, 2024, at 20:58, Eric Robinson  wrote:
> >
> > One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day. During the slow times, we've done all the usual troubleshooting to
> catch the problem in the act. The servers have plenty of power and are not
> overworked. There are no slow database queries. Network connectivity is solid.
> Tomcat has plenty of memory. The numbers of database connections, threads,
> questions, queries, etc., remain steady, without spikes. There is no unusual 
> disk
> latency. We have not found any maintenance tasks running during that
> timeframe.
> 
> 
> 
> 
> 
> > There are no unusual errors in the tomcat or database server logs, EXCEPT
> this one: Java.sql.DriverManager.getConnection
> 
> 
> 
> 
> 
> > During the periods of slowness, we see lots of those errors along with a 
> > large
> spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). 
> It
> seems obvious that the threads are stuck because tomcat is waiting on a
> connection to the database.
> 
> 
> 
> 
> 
> > We are forced to conclude that some database connection requests are being
> initiated but are not being sent on the wire.
> 
> 
> Could the DB server be out of ports? (Seems unlikely, based on your debugging
> so far.)
> 
> Any chance that the Tomcat process is running out of file descriptors? Or 
> ports?
> 
> Can you force a garbage collection (e.g., with jconsole or similar tool) 
> during a
> slow period? If there is some limit on an OS-level resource that’s being 
> reached,
> a GC may be able to delete the Java objects that are tying up the underlying
> resources.
> 
>   - Chuck
> 


On the client side, the TCP connections are kept in a wait-state for usually 2 
minutes as far as I know.
Maybe you can check how many are in this state.

If the application doesn’t use connection pooling, then this can be the problem 
itself too.
TCP handshakes and logon process take a while and for performance reasons, DB 
connections are usually pooled.

A stacktrace might help to see what java is doing when it enters this blocking 
state.
Maybe you can provide a stack when the app starts blocking.

Greetings, 
Thomas


Re: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Chuck Caldarale


> On May 25, 2024, at 20:58, Eric Robinson  wrote:
> 
> One of our hosting customers is a medical practice using a commercial EMR 
> running on tomcat+mysql. It has operated well for over a year, but users have 
> suddenly begun experiencing slowness for about an hour at the same time every 
> day. During the slow times, we've done all the usual troubleshooting to catch 
> the problem in the act. The servers have plenty of power and are not 
> overworked. There are no slow database queries. Network connectivity is 
> solid. Tomcat has plenty of memory. The numbers of database connections, 
> threads, questions, queries, etc., remain steady, without spikes. There is no 
> unusual disk latency. We have not found any maintenance tasks running during 
> that timeframe.





> There are no unusual errors in the tomcat or database server logs, EXCEPT 
> this one: Java.sql.DriverManager.getConnection





> During the periods of slowness, we see lots of those errors along with a 
> large spike in the number of stuck tomcat threads (from 1 or 2 to as high as 
> 100). It seems obvious that the threads are stuck because tomcat is waiting 
> on a connection to the database.





> We are forced to conclude that some database connection requests are being 
> initiated but are not being sent on the wire.


Could the DB server be out of ports? (Seems unlikely, based on your debugging 
so far.)

Any chance that the Tomcat process is running out of file descriptors? Or ports?

Can you force a garbage collection (e.g., with jconsole or similar tool) during 
a slow period? If there is some limit on an OS-level resource that’s being 
reached, a GC may be able to delete the Java objects that are tying up the 
underlying resources.

  - Chuck


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Thomas,


> -Original Message-
> From: Thomas Hoffmann (Speed4Trade GmbH)
> 
> Sent: Sunday, May 26, 2024 2:52 AM
> To: Tomcat Users List 
> Subject: AW: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hello Eric,
>
> > -Ursprüngliche Nachricht-
> > Von: Eric Robinson 
> > Gesendet: Sonntag, 26. Mai 2024 03:59
> > An: users@tomcat.apache.org
> > Betreff: Database Connection Requests Initiated but Not Sent on the
> > Wire (Some, Not All)
> >
> > One of our hosting customers is a medical practice using a commercial
> > EMR running on tomcat+mysql. It has operated well for over a year, but
> > users have suddenly begun experiencing slowness for about an hour at
> > the same time every day. During the slow times, we've done all the
> > usual troubleshooting to catch the problem in the act. The servers
> > have plenty of power and are not overworked. There are no slow database
> queries. Network connectivity is solid.
> > Tomcat has plenty of memory. The numbers of database connections,
> > threads, questions, queries, etc., remain steady, without spikes.
> > There is no unusual disk latency. We have not found any maintenance
> > tasks running during that timeframe.
> >
> > The customer has another load-balanced tomcat instance on a different
> > physical server, and the problem happens on that one, too. The servers
> > were upgraded with a new kernel and packages on 4/5/24, but the issue
> > did not appear until 5/6/24. The vendor enabled a new feature in the
> > customer's software, and the problem appeared the next day, but they
> > subsequently disabled the feature, and (reportedly) the problem did
> > not go away. It is worth mentioning that the servers are
> > multi-tenanted, with other customers running the same medical
> > application, but the others do not experience the slowdowns, even though
> they are on the same servers.
> >
> > There are no unusual errors in the tomcat or database server logs,
> > EXCEPT this
> > one: Java.sql.DriverManager.getConnection
> >
> > During the periods of slowness, we see lots of those errors along with
> > a large spike in the number of stuck tomcat threads (from 1 or 2 to as
> > high as 100). It seems obvious that the threads are stuck because
> > tomcat is waiting on a connection to the database. However, tcpdump
> > shows that connectivity to the database is perfect at the network and
> > application layers. There are no unanswered SYNs, no retransmissions,
> > no half-open connections, no failures to allocate TCP ports, no
> > conntrack messages, and no other indications of system resource
> > exhaustion. Every time tomcat requests a connection to the DB, it
> > completes in less than 1 ms. Ten thousand connection attempts completed
> successfully in about 15 seconds, with zero failures.
> >
> > We are forced to conclude that some database connection requests are
> > being initiated but are not being sent on the wire. The problem seems
> > to be in the interaction between tomcat and the database driver, or in the
> driver itself.
> > Unfortunately, the application vendor is taking the "it's your 
> > infrastructure"
> > position without providing any evidence or offering suggestions for
> > configuration changes, other than to deploy more tomcat instances,
> > which is just shooting in the dark. They don't know why the software
> > is throwing java.sql.DriverManager.getConnection errors (even though
> > it's their code), and they've relegated the investigation to us.
> >
> > Any advice from the community would be greatly appreciated.
> >
> > RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64 Apache Tomcat/9.0.80,
> > JVM
> > 1.8.0_372-b07
> >
> > (The tomcat and JVM versions are the ones recommended by the vendor.)
> >
> > We're standing by to provide whatever other information the community
> > may need.
> >
> > Thanks tons!
> >
> > -Eric
>
> The database connections are usually pooled.
> If the pool is exhausted, the thread will wait till a connection is returned 
> to the
> pool which can be reused.
> Do you use connection pooling?
> How does the configuration look like?
> Do you monitor the pool usage?
>
> In general, it doesn’t look like a Tomcat issue per sé.
>
> Greetings,
> Thomas
>

I have asked the vendor that question several times, but their technicians have 
never provided a clear answer. Most of the time they have not even understood 
the question. If pooling were enabled, I would expect to see maxTotal or 
maxIdle tags in a context.xml or web.xml file somewhere in the system, but they 
are not there. The database connection details are stored in a file named 
myapp.properties (renamed here to protect the identity of the vendor, as I 
imagine they visit this forum). The string looks like this:

myapp.DBName=mydb?useSSL=false=round=false=true=true=false=true=true

In WireShark, I see thousands of connections, and from 1 to several SQL queries 
being issued over the same connection before it closes. 

AW: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Thomas Hoffmann (Speed4Trade GmbH)
Hello Eric,

> -Ursprüngliche Nachricht-
> Von: Eric Robinson 
> Gesendet: Sonntag, 26. Mai 2024 03:59
> An: users@tomcat.apache.org
> Betreff: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
> 
> One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day. During the slow times, we've done all the usual troubleshooting to
> catch the problem in the act. The servers have plenty of power and are not
> overworked. There are no slow database queries. Network connectivity is solid.
> Tomcat has plenty of memory. The numbers of database connections, threads,
> questions, queries, etc., remain steady, without spikes. There is no unusual 
> disk
> latency. We have not found any maintenance tasks running during that
> timeframe.
> 
> The customer has another load-balanced tomcat instance on a different
> physical server, and the problem happens on that one, too. The servers were
> upgraded with a new kernel and packages on 4/5/24, but the issue did not
> appear until 5/6/24. The vendor enabled a new feature in the customer's
> software, and the problem appeared the next day, but they subsequently
> disabled the feature, and (reportedly) the problem did not go away. It is 
> worth
> mentioning that the servers are multi-tenanted, with other customers running
> the same medical application, but the others do not experience the slowdowns,
> even though they are on the same servers.
> 
> There are no unusual errors in the tomcat or database server logs, EXCEPT this
> one: Java.sql.DriverManager.getConnection
> 
> During the periods of slowness, we see lots of those errors along with a large
> spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). 
> It
> seems obvious that the threads are stuck because tomcat is waiting on a
> connection to the database. However, tcpdump shows that connectivity to the
> database is perfect at the network and application layers. There are no
> unanswered SYNs, no retransmissions, no half-open connections, no failures to
> allocate TCP ports, no conntrack messages, and no other indications of system
> resource exhaustion. Every time tomcat requests a connection to the DB, it
> completes in less than 1 ms. Ten thousand connection attempts completed
> successfully in about 15 seconds, with zero failures.
> 
> We are forced to conclude that some database connection requests are being
> initiated but are not being sent on the wire. The problem seems to be in the
> interaction between tomcat and the database driver, or in the driver itself.
> Unfortunately, the application vendor is taking the "it's your infrastructure"
> position without providing any evidence or offering suggestions for
> configuration changes, other than to deploy more tomcat instances, which is
> just shooting in the dark. They don't know why the software is throwing
> java.sql.DriverManager.getConnection errors (even though it's their code), and
> they've relegated the investigation to us.
> 
> Any advice from the community would be greatly appreciated.
> 
> RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64 Apache Tomcat/9.0.80, JVM
> 1.8.0_372-b07
> 
> (The tomcat and JVM versions are the ones recommended by the vendor.)
> 
> We're standing by to provide whatever other information the community may
> need.
> 
> Thanks tons!
> 
> -Eric

The database connections are usually pooled.
If the pool is exhausted, the thread will wait till a connection is returned to 
the pool which can be reused.
Do you use connection pooling?
How does the configuration look like?
Do you monitor the pool usage?

In general, it doesn’t look like a Tomcat issue per sé.

Greetings,
Thomas

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-25 Thread Eric Robinson
One of our hosting customers is a medical practice using a commercial EMR 
running on tomcat+mysql. It has operated well for over a year, but users have 
suddenly begun experiencing slowness for about an hour at the same time every 
day. During the slow times, we've done all the usual troubleshooting to catch 
the problem in the act. The servers have plenty of power and are not 
overworked. There are no slow database queries. Network connectivity is solid. 
Tomcat has plenty of memory. The numbers of database connections, threads, 
questions, queries, etc., remain steady, without spikes. There is no unusual 
disk latency. We have not found any maintenance tasks running during that 
timeframe.

The customer has another load-balanced tomcat instance on a different physical 
server, and the problem happens on that one, too. The servers were upgraded 
with a new kernel and packages on 4/5/24, but the issue did not appear until 
5/6/24. The vendor enabled a new feature in the customer's software, and the 
problem appeared the next day, but they subsequently disabled the feature, and 
(reportedly) the problem did not go away. It is worth mentioning that the 
servers are multi-tenanted, with other customers running the same medical 
application, but the others do not experience the slowdowns, even though they 
are on the same servers.

There are no unusual errors in the tomcat or database server logs, EXCEPT this 
one: Java.sql.DriverManager.getConnection

During the periods of slowness, we see lots of those errors along with a large 
spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It 
seems obvious that the threads are stuck because tomcat is waiting on a 
connection to the database. However, tcpdump shows that connectivity to the 
database is perfect at the network and application layers. There are no 
unanswered SYNs, no retransmissions, no half-open connections, no failures to 
allocate TCP ports, no conntrack messages, and no other indications of system 
resource exhaustion. Every time tomcat requests a connection to the DB, it 
completes in less than 1 ms. Ten thousand connection attempts completed 
successfully in about 15 seconds, with zero failures.

We are forced to conclude that some database connection requests are being 
initiated but are not being sent on the wire. The problem seems to be in the 
interaction between tomcat and the database driver, or in the driver itself. 
Unfortunately, the application vendor is taking the "it's your infrastructure" 
position without providing any evidence or offering suggestions for 
configuration changes, other than to deploy more tomcat instances, which is 
just shooting in the dark. They don't know why the software is throwing 
java.sql.DriverManager.getConnection errors (even though it's their code), and 
they've relegated the investigation to us.

Any advice from the community would be greatly appreciated.

RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64
Apache Tomcat/9.0.80, JVM 1.8.0_372-b07

(The tomcat and JVM versions are the ones recommended by the vendor.)

We're standing by to provide whatever other information the community may need.

Thanks tons!

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.