Re: Tomcat 8.x-9.x + Struts 1.3.x - Applications will get into a state where they won't serve random bunch of static resources, 500 errors

2018-08-06 Thread Chrifister
I may have found the problem and solution. We went back and had a look at
all the logs, not just catalina.out. There was a localhost log made by
Tomcat and in there was a number of exceptions from when the application
was in that weird state. The exceptions were as follows:

 java.io.IOException: Broken pipe

 java.sql.SQLException: Connection has already been closed.

 org.hibernate.TransactionException: JDBC begin transaction failed:

 org.hibernate.TransactionException: Already have an associated managed
connection

This last one repeats every second or so until the application is
restarted. This led me to a custom request filter that came from a book
about Hibernate, specifically a section about transaction management. It
says to create a filter and to wrap the transaction around the filter
chain. The code in the filter is as follows:

Session session = HibernateSessionFactory.getSession();
Transaction tx = session.beginTransaction();

try {
chain.doFilter(request, response);
tx.commit();
session.close();
} catch (Throwable ex) {
try {
if (tx.isActive()) {
tx.rollback();
session.close();
}
} catch(Throwable rbEx) {
rbEx.printStackTrace();
}
throw new ServletException(ex);
}

The first exception about a broken pipe indicates a connection was lost.
This is probably from mobile users losing signal or something. Since the
connection was lost, the connection closes somewhere, somehow. So if the
connection is lost during chain.doFilter(), then when it tries tx.commit(),
it fails because the connection is already closed. This falls to the catch
block and I assume tx.isActive() is false at that point so session.close()
is never called. The next request fails on session.beginTransaction() and
then since we are forever throwing exceptions, session.close() is never
called and the app can no longer fully process requests because it never
makes it to chain.doFilter(). This theory follows the order of the thrown
exceptions in the logs.

I can reproduce the issue locally by adding a breakpoint to an action class
and letting eclipse sit there for a couple of minutes. I assume the JDBC
driver closes the connection because when I resume the application I'm
seeing the same symptoms as the issue we've been having. With the following
updated code, only one exception about the connection already being closed
is thrown, but the application recovers and is still usable. I may still
add a catch for that SqlException and just log a message about connection
lost or something.

Session session = HibernateSessionFactory.getSession();
Transaction tx = session.beginTransaction();

try {
chain.doFilter(request, response);
if (session.isOpen() && tx.isActive())
tx.commit();
} catch (Throwable ex) {
try {
if (session.isOpen() && tx.isActive())
tx.rollback();
} catch (Throwable rbEx) {
rbEx.printStackTrace();
}
throw new ServletException(ex);
} finally {
if (session.isOpen())
session.close();
}

We'll deploy the fix to the two most problematic applications tomorrow and
hopefully we won't see the issue anymore. Thanks guys for all commenting
about the logs. That prompted me to go have a deeper look at everything
Tomcat was logging. I'm still confused as to why the exceptions never made
it to catalina.out?

On Mon, Aug 6, 2018 at 2:25 PM Mark Thomas  wrote:

>
>
> On 06/08/2018 16:54, Chrifister wrote:
>
> 
>
> > Any help would be greatly appreciated.
>
> No real ideas. Just requests for more information and an observation.
>
> The 500 responses should have triggered stack traces in the logs. Can
> you provide some sample stack traces of the errors you are seeing.
>
> What does a thread dump show?
>
> Are you accessing Tomcat directly or via a reverse proxy such as httpd?
>
> That only one app of several gets into this state while other
> applications work correctly points towards an application issue rather
> than anything else.
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>


Tomcat 8.x-9.x + Struts 1.3.x - Applications will get into a state where they won't serve random bunch of static resources, 500 errors

2018-08-06 Thread Chrifister
Hi,

Our current setup is Tomcat 9.0.8 running SuSE Enterprise. This server is
running a dozen web applications built with Struts 1.3.8 with some newer
Spring applications on the horizon. There is a large user base with some
applications seeing heavy usage. Applications are currently using Java 1.7
and 1.8.

We were originally running Tomcat 7.x but were having issues with perm gen
maxing out very quickly for unknown reasons but possibly related to a buggy
third party "enterprise-grade" reporting Java library. We had to restart
the server nightly to try and keep perm gen from maxing out. Part of the
reason was this third-party library spawned immortal threads that would
prevent an application from unloading and being garbage collected when a
newer build of an application was deployed (the developers behind it never
expected the library would be run on a server with multiple
applications). So we upgraded Tomcat to 8.5.x first and then to 9.x
recently. This fixed the perm gen issue.

Our current issue we are having is that for some unknown reason and after
seemingly random lengths of time, an application will get into a state and
will start having issues which results in failed page loads or pages not
loading correctly. According to Chrome's network tab in developer console,
a random bunch of static resources (javascript, css, images) are returning
500 errors and not being served. Whether the page loads or not depends on
exactly which resources were not returned. Every time you access any page
in that application, another random bunch of resources have 500 errors.
There's no indication in any of Tomcat's log files that an application is
in this state. The application will stay in this unusable state until it is
restarted or the server is restarted.

We've resorted to once again scheduling the server to restart nightly which
has cut down on the frequency of this happening which hints at this being
related to usage, but it is still happening once a week and sometimes more.
The applications that seem to experience this the most are I believe the
more heavily used applications.

No Spring application has experienced this issue on our other servers which
leads me to tentatively say that Spring is not affected and/or is not a
cause of the issue but upgrading all applications to Spring is not feasible
at the moment.

We've tried upgrading Struts in the most frequently affected applications
to 1.3.10 but it did not solve the issue and actually afflicted us with
another issue stemming from a bug in that Struts version. So we had to go
back to 1.3.8.

I spoke with a couple of people in Tomcat's IRC channel and they seemed to
think it was a third-party library or a problem/race condition between the
Struts and Tomcat servlets. While this may be important information, I have
no idea what to do with it.

I'm not sure debugging is a possibility because it's a remote server and I
wouldn't even know what to look for. I also can't allow a production
application to remain in this state for very long.

I can't file a bug report because I can't reproduce it at will and I am
unable to provide thread or heap dumps.

I have a suspicion it may be caused by that third part library although I
don't see how that library would affect Tomcat's serving of static
resources.

This issue has never happened to our test server or our local instances of
tomcat. Since I suspect it's related to usage, this is not surprising.

Any help would be greatly appreciated.

Chris