Re: Short term roller/blogs suggestion

Rainer Jung Wed, 16 Jul 2014 17:04:27 -0700

Note: I kept dev@roller on CC, which was added by Dave.


Update inline.

On 15.07.2014 10:45, Rainer Jung wrote:

1) Trivial: Add response duration to Tomcat access log

Add "%D" at end of pattern string in server.xml access log configuration.

Done, also added "%I" (thread name) and "%S" (session id) to the accesslog pattern in server.xml. Thread name useful to correlate requests withe.g. thread dumps and log statements (if the log config also logs thethread name).


File committed to svn.

2) Medium: Add GC log

To rule out GC problems (we don't expect them, but I would like to make
sure), add a couple of startup parameters that write sensible GC data to
a log.

Note that GC logs have no nice way of rotation. Even the new JVM flags
allowing GC log rotation will overwrite GC log files when the process
starts. So the theoretical risk is writing big log files, but since
roller is restarted relatively frequently, getting rid of the log by
removing the new startup parameters is easy.

Done by adding bin/setenv.sh to CATALINA_BASE which is automaticallyread by the start script catalina.sh which in turn is used by the Ubuntuplatform TC underlying our roller installation.


Config used:

CATALINA_OPTS="
-verbose:gc
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-Xloggc:$CATALINA_BASE/logs/gc.log_`date +%Y%m%d_%H%M%S`
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=50M"

Committed to svn.

3) Medium: Add analysis code

I'd like to deploy an additional app inside the roller Tomcat, that
tracks long running requests and writes our stack information only for
the threads that have long running requests on them. The app is not open
source but my own IP, so no licensing issue here. The protocol is
written as a rotating log file, that can then be investigated.

The app is packaged as a normal webapp (war) but it doesn't actually
provide a useful web interface. Packaging as a war is just an easy way
of getting it deployed inside Tomcat.

I have that code being running in important and high-load applications
since a few years, so i see no risk of negatively ifluencing stability
or performance. It is not doing a tracing, it only collects snapshots of
threads doing long running since in regular time intervals like once per
minute.

Added as CATALINA_BASE/conf/perfdiag.properties andCATALINA_BASE/webapps/kpdtperfidag.war. Writes logsCATALINA_BASE/logs/*perfdiag* with daily rotation. But see below.

Config: Check once every 30 seconds for requests being processed forlonger than 10 seconds, and if found write out current stack.


Deliberately not added to svn.

4) Trivial: Turn off the https connector

The Tomcat https connector is currently configured only partially and
complains on startup. We don't currently use it, because the ssl
endpoint is on erebus-ssl which forward to the normal http port.


Done in server.xml, committed to svn.

5) Medium: Enabling Tomcat Manager Webapp with role manager-jmx

Enabling Tomcat Manager webapp with a user in role manager-jmx with a
hashed password in tomcat-users.xml.

This gives me the opportunity to collect JMX information on the fly via
simple lightweight scripts.


Done by

- adding the tomcat7-admin package to the system

- disabling the host-manager by renaming host-manager.xml (add trailingunderscore)

- committed manager.xml to svn
- added a user plus password for access to the manager
- changed realm config in server.xml to use hashed passwords

Committed to svn.

6) More complex: Caching

Since roller has only small content, caching would be technically cheap.
Unfortunately currently the response cache headers are not nice to help
caching. So if we want to go down this road, we would have to configura
or fix roller to set useful caching headers (or overwrite them by
Tomcat). Plus we would need to setup the cache, being it inside the
Apache reverse proxy in front of roller, or a separate component.

I'd like to execute quickly on 1-5 and investigate the collected data.

Note that recently the db connection pooling was again turned off,
because problems were observed.


Not done.

My current impressions:

- After full startup the instance is performing well. There were no longrunning requests (>10 seconds) during the first few hours except duringstartup. As long as this is unchanged, the added perfdiag will not loginfo about long running requests, because there are none.

- there were even only very few requests above 2 seconds (a few perhour), and not many above 1 second.


- request load is about 2-3 requests per second

- heap size is a bit small. Under current load GC still behavesacceptable, but there is very little head room. Unfortunately the heapsettings for this platform Tomcat are in the rc script. I changed itthere from 128MB to 256MB for the next restart. That is still not much,but will increase our head room a lot.

- the current configuration does not use a DB pool. Instead whenever aDB connection is needed it is created on the fly and then thrown awayafter use. I'm astonished how well this works. It does produce about 500db connections in TIME_WAIT state (after they have been successfullyclosed), which is not a problem per se. It would become a problem ifthis number goes above roughly 5000, because then some TCP actions canget slow.

- I still think the config should switch to a db pool, but it is said,that problems occured after that was done. For instance there was anobservation, that after some time roller tried to open db connections toa wrong host. I didn't have a chance to observe it myself.

- after I prepared the above changes, we actually had a situation wereour monitoring checks independently fired an alarm for roller. Humbedoohdid two thread dumps which are in the catalina.out file.


Those dumps show the following picture (db pool was off):

  - 57 threads in the stack:

at com.mysql.jdbc.ResultSetImpl.getInstance(ResultSetImpl.java:355)
at com.mysql.jdbc.MysqlIO.buildResultSetWithUpdates(MysqlIO.java:2673)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2561)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1757)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2171)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2562)
...

  - 23 threads in the stack

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:410)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2581)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1757)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2171)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2562 or 2568)
- locked <0xHEXADDR> (a java.lang.Object)
...

  - Another 16 threads in the stack

at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:307)

atcom.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2168)

at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2127)
- locked <0xHEXADDR> (a java.lang.Object)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:774)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:49)
...

  - 12 threads in

at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2429)
at com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2409)

atcom.mysql.jdbc.ConnectionImpl.getMetadataSafeStatement(ConnectionImpl.java:3011)atcom.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3773)atcom.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3365)atcom.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2189)

...

and another 29 threads in different phases of request handling, of which12 were again caught in various places inside jdbc or mysql jdbc driver.That sums up to a total of about 120 threads being somewhere inside jdbccode.

Whether the root cause is a communication problem with the db, or a dbrelated config issue, or whether the root cause lies outside, e.g. GCproblems, should be decidable the next time it happens.


The current jdbc config is based on the following connection string:

jdbc:mysql://***:***/***?autoReconnect=true&useUnicode=true&characterEncoding=utf-8&mysqlEncoding=utf8&InitialCapacity=10&MaxCapacity=25

I could find no indication, that "InitialCapacity=10&MaxCapacity=25" hasany functional meaning here. It looks like we should drop those toreduce confusion.

For the autoReconnect, it is said to be deprecated and not reallyrecommended.

I would suggest to switch back using a connection pool with thefollowing config:


    <Resource name="jdbc/rollerdb" auth="Container"
        type="javax.sql.DataSource"
        driverClassName="com.mysql.jdbc.Driver"

url="jdbc:mysql://***:***/***?useUnicode=true&amp;characterEncoding=utf-8&amp;mysqlEncoding=utf8"
        username="***"
        password="***"
        initialSize="10"
        maxActive="50"
        maxIdle="5"
        testWhileIdle="true"
        testOnBorrow="true"
        testOnReturn="false"
        validationQuery="SELECT 1"
        removeAbandoned="true"
        removeAbandonedTimeout="60"
        logAbandoned="true"
        maxWait="3000"
        />

I have dropped the eviction config, because I have not yet checked howold the dbcp+pool is and the eviction and abandoned features had lockingproblems for a long time (no more for recent versions). We would like tokeep abandoned, but eviction is not that important.

I guess we also need to adjust propertis in lib/planet-custom.propertiesand lib/roller-custom.properties if we want to switch to a pool definedin the context.xml (ROOT.xml) once more.

I will add a few scripts during the next day, which will track somedata, e.g. the number of open db connections (to detect a possibleleak). I will also check, whether the service doc needs an update w.r.t.my recent changes.

The update of the roller version proposed by Dave does not need to waitfor any of the additional actions proposed by me above. Whatever we areready to do should be executed upon.


Regards,

Rainer

Re: Short term roller/blogs suggestion

Reply via email to