Hi,

On Thu, Jun 4, 2020 at 9:28 PM Rémy Maucherat <r...@apache.org> wrote:

> On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote:
>
>> On 04/06/2020 16:33, Martin Grigorov wrote:
>> > Hi team,
>> >
>> > I am profiling our application and I've found that Tomcat uses its own
>> > Synchronized Queue/Stack implementations which synchronize on instance
>> > level, i.e. their methods are 'synchronized'.
>>
>
> Can you work off 10 (master) first ?
>

The application is based on Spring Boot and my attempts to
"jakartaee migrate it" were not successful.

But I've created a very simple Servlet and I will try it on Tomcat 10.x
soon:

@WebServlet(urlPatterns = "/test")
public class PlainTextServlet extends HttpServlet {

    public static final String CONTENT = "Hello world!";
    public static final int CONTENT_LENGTH = CONTENT.length();

    @Override
    protected void doGet(final HttpServletRequest req, final
HttpServletResponse resp) throws IOException {
         resp.setContentType("text/plain");
         resp.setContentLength(CONTENT_LENGTH);
         resp.getWriter().println(CONTENT);
    }
}

For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test
The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc
JDK 1.8.0 b252

The GC logs are:
- Tomcat 9.0.x (uses SynchronizedQueue/Stack):
https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6
- Tomcat 9.0.x with my changes:
https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852
I've uploaded them to https://gceasy.io/ and the charts are similar.

The summary from wrk with SynchronizedQueue/Stack is:

Running 1m test @ http://192.168.0.206:8080
  8 threads and 96 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.24ms  788.15us  19.46ms   91.44%
    Req/Sec    10.04k     1.64k   12.45k    79.48%
  4797193 requests in 1.00m, 600.19MB read
Requests/sec:  79834.45
Transfer/sec:      9.99MB

I didn't preserve the summary from the lock-free run but it was similar -
the throughput is around 80K/sec +- few thousands.
I.e. the throughput does not improve after my changes!
Only the locking profiling results become better (Async Profiler:
./profile.sh -e lock -d50s -f lock-results.txt PID)

Martin


>
>>
>> ~8 years ago when I was profiling Tomcat and wrote those having
>> something targetted to Tomcat's specific use case made a measurable
>> improvement.
>>
>
> Yes, the SyncStack was the go to structure for all reuse. Not sure if it's
> the best.
>
> I already removed some of the usage in 10:
> - No more block poller and its pooling/recycling [I had to pull its
> removal out of 9 due to an unfortunate regression, this could be brought
> back now that it seems to work]
> - No recycling of trivial objects
>
> The main use should now be the recycled processors and the recycled
> channels (it would be bad to not reuse them since they may use direct
> buffers ...).
>
>
>>
>> I am more than prepared to believe that things have moved on since then
>> but I'd be interested in seeing the same numbers with the latest Java 8
>> from Adopt OpenJDK rather than Java 14.
>>
>
> +1, firnally there could be some differences between JDK, like now they
> started to actually improve NIO 1 again.
>
> Rémy
>
>
>>
>> I'd also like to see overall throughput numbers and the GC frequency for
>> each test scenario.
>>
>> Mark
>>
>>
>>
>> >
>> > The javadoc for those two classes say that they are preferred because
>> > they are GC-friendlier than
>> > java.uti.concurrent.ConcurrentLinkedQueue.
>> >
>> > I use Async Profiler
>> > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU,
>> > memory,
>> > locks and few lower level types (from perf).
>> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such
>> reports:
>> >
>> > Started [lock] profiling
>> >
>> >  --- Execution profile ---
>> >
>> >  Total samples       : 713834
>> >
>> >
>> >
>> >   Frame buffer usage  : 0.0288%
>> >
>> >
>> >                                                                     ---
>> > 1106165618177 ns (58.97%), 395609 samples
>> >
>> >                                                              [ 0]
>> > org.apache.tomcat.util.collections.SynchronizedStack
>> >
>> >                                                          [ 1]
>> > org.apache.tomcat.util.collections.SynchronizedStack.push
>> >
>> >                                                           [ 2]
>> > org.apache.tomcat.util.net.NioBlockingSelector.write
>>
>> >
>> >                                                          [ 3]
>> > org.apache.tomcat.util.net.NioSelectorPool.write
>>
>> >
>> >                                                          [ 4]
>> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>> >
>> >                                                           [ 5]
>> > org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>>
>> >
>> >                                                          [ 6]
>> > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>>
>> >
>> >                                                          [ 7]
>> > org.apache.tomcat.util.net.SocketWrapperBase.flush
>>
>> >
>> >                                                          [ 8]
>> > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>> >
>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>> >   [12] org.apache.coyote.AbstractProcessor.action
>> >   [13] org.apache.coyote.Response.action
>> >   [14] org.apache.catalina.connector.OutputBuffer.close
>> >   [15] org.apache.catalina.connector.Response.finishResponse
>> >   [16] org.apache.catalina.connector.CoyoteAdapter.service
>> >   [17] org.apache.coyote.http11.Http11Processor.service
>> >   [18] org.apache.coyote.AbstractProcessorLight.process
>> >   [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>> >   [21] org.apache.tomcat.util.net.SocketProcessorBase.run
>> >   [22] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [25] java.lang.Thread.run
>> >
>> > --- 769519847169 ns (41.03%), 314384 samples
>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedStack
>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
>> >   [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
>> >   [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
>> >   [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>> >   [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>> >   [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>> >   [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
>> >   [ 8]
>> org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>> >    ....
>> >
>> >
>> > I've made the following changes
>> > -
>> https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
>> > and ran the test suite - all green!
>> >
>> > Now the output from the locks profiling looks like:
>> >
>> >
>> > --- 72354435197 ns (47.98%), 4008 samples
>> >   [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
>> >   [ 1] jdk.internal.misc.Unsafe.park
>> >   [ 2] java.util.concurrent.locks.LockSupport.park
>> >   [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>> >   [ 4]
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
>> >   [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
>> >   [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
>> >   [ 7] java.util.concurrent.LinkedBlockingQueue.poll
>> >   [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
>> >   [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
>> >   [10] java.util.concurrent.ThreadPoolExecutor.getTask
>> >   [11] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [14] java.lang.Thread.run
>> >
>> > --- 58413748017 ns (38.74%), 10179 samples
>> >   [ 0] java.lang.Object
>> >   [ 1]
>> org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
>> >   [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
>> >   [ 3]
>> org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
>> >   [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
>> >   [ 5] org.apache.catalina.core.StandardContextValve.invoke
>> >   [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
>> >   [ 7] org.apache.catalina.core.StandardHostValve.invoke
>> >   [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
>> >   [ 9] org.apache.catalina.core.StandardEngineValve.invoke
>> >   [10] org.apache.catalina.connector.CoyoteAdapter.service
>> >   [11] org.apache.coyote.http11.Http11Processor.service
>> >   [12] org.apache.coyote.AbstractProcessorLight.process
>> >   [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>> >   [15] org.apache.tomcat.util.net.SocketProcessorBase.run
>> >   [16] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [19] java.lang.Thread.run
>> >
>> > --- 5856040617 ns (3.88%), 11395 samples
>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
>> >   [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
>> >   [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
>> >   [ 4]
>> > org.apache.tomcat.util.net
>> .NioEndpoint$NioSocketWrapper.registerReadInterest
>> >   [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   ...
>> >
>> >
>> > I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns
>> > are the same, no new spikes.
>> > JDK 14 with G1GC.
>> >
>> > I load the application with wrk (https://github.com/wg/wrk/) and it
>> > makes 70/80K per second.
>> >
>> > I guess there is a reason to use these custom synchronized data
>> > structures but so far I don't see any drawbacks after moving to the
>> > lock-free impls.
>> >
>> > Regards,
>> > Martin
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: dev-h...@tomcat.apache.org
>>
>>

Reply via email to