I have observed a significant degradation in GC performance after upgrading from JBoss 4.2.3.GA to WildFly 10.0.0.Final. I'm 99% sure the difference is due to Artemis ActiveMQ making heavy use of PhantomReference objects (directly or indirectly). I'm raising this discussion to see if anyone else can observe the same behaviour and determine whether to raise an issue against Artemis or one of its dependencies.
What's changed? I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly 10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC settings (using G1GC). The application is only minimally changed as required by the API differences between the old and new servers. The application load is identical and results in near identical memory allocation rate and young GC frequency. *The difference is that on JBoss each young GC takes around 10-12ms whereas on WildFly they take 16-20ms.* The additional time is taken in the Object Copy and Ref Proc phases. This indicates that there are more Survivor objects on each GC and more References being processed. I'm pretty sure the additional Survivor objects are also the Reference objects. Running a Java Mission Control (JMC) Flight Recording on these two systems under identical production load, WildFly is showing around 11,000 PhantomReferences processed each GC versus a handful on JBoss. The number of PhantomReferences seems to approximately correspond to the number of JMS messages transmitted between GCs. Some evidence In a separate test environment I have WildFly running under a standard test load. It is emitting JMS messages at a consistent rate (lower than production) onto a non-durable Topic. The following image shows a 2 minute JMC recording when there is no subscriber on the Topic. We can see that there are very few PhantomReferences and the GCs take around 4ms. <http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png> However once I start up an external JMS client subscribed to the Topic the PhantomReferences appear and the GCs slow down as shown in the next image. The messaging is non-XA with auto-ack. <http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png> Where do the PhantomReferences come from? I used the debugger to capture where the PhantomReferences are being created. This seems to indicate they are generated by Artemis and/or the associated XNIO remoting... Thread [Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)] (Suspended (breakpoint at line 80 in PhantomReference)) Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80 Cleaner.<init>(Object, Runnable) line: 115 Cleaner.create(Object, Runnable) line: 133 DirectByteBuffer.<init>(int) line: 139 ByteBuffer.allocateDirect(int) line: 311 UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108 UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int) line: 155 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line: 146 NettyServerConnection.createTransportBuffer(int) line: 38 RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int) line: 162 SessionReceiveMessage.encode(RemotingConnection) line: 59 ChannelImpl.send(Packet, boolean, boolean) line: 225 ChannelImpl.sendBatched(Packet) line: 205 CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) line: 84 ServerConsumerImpl.deliverStandardMessage(MessageReference, ServerMessage) line: 883 ServerConsumerImpl.proceedDeliver(MessageReference) line: 366 QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358 QueueImpl.deliver() line: 1873 QueueImpl.access$1400(QueueImpl) line: 97 QueueImpl$DeliverRunner.run() line: 2581 OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142 ThreadPoolExecutor$Worker.run() line: 617 Thread.run() line: 745 Thread [default I/O-5] (Suspended (breakpoint at line 80 in PhantomReference)) Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80 Cleaner.<init>(Object, Runnable) line: 115 Cleaner.create(Object, Runnable) line: 133 DirectByteBuffer.<init>(int) line: 139 ByteBuffer.allocateDirect(int) line: 311 UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108 UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int) line: 155 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line: 146 UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107 AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line: 104 AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel) line: 426 AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371 ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line: 92 ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 NioSocketConduit.handleReady(int) line: 88 WorkerThread.run() line: 559 Thread [default I/O-4] (Suspended (breakpoint at line 80 in PhantomReference)) Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80 Cleaner.<init>(Object, Runnable) line: 115 Cleaner.create(Object, Runnable) line: 133 DirectByteBuffer.<init>(int) line: 139 ByteBuffer.allocateDirect(int) line: 311 UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108 UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int) line: 155 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line: 146 UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107 AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line: 104 AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel) line: 426 AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371 ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line: 92 ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 NioSocketConduit.handleReady(int) line: 88 WorkerThread.run() line: 559 Thread [default I/O-2] (Suspended (breakpoint at line 80 in PhantomReference)) Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80 Cleaner.<init>(Object, Runnable) line: 115 Cleaner.create(Object, Runnable) line: 133 DirectByteBuffer.<init>(int) line: 139 ByteBuffer.allocateDirect(int) line: 311 UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108 UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int) line: 155 UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line: 146 UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107 AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line: 104 AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel) line: 426 AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371 ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line: 92 ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 NioSocketConduit.handleReady(int) line: 88 WorkerThread.run() line: 559 What's next? So it's seems pretty certain that this is where the change is coming from. The total GC load is still low but we're aiming for consistent low latency and this is a backwards step. I'd be interested in any comments or questions. I can provide more info if required. Thanks. -- View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.