I have observed a significant degradation in GC performance after upgrading
from JBoss 4.2.3.GA to WildFly 10.0.0.Final.  I'm 99% sure the difference is
due to Artemis ActiveMQ making heavy use of PhantomReference objects
(directly or indirectly).  I'm raising this discussion to see if anyone else
can observe the same behaviour and determine whether to raise an issue
against Artemis or one of its dependencies.

What's changed?


I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly
10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC
settings (using G1GC).  The application is only minimally changed as
required by the API differences between the old and new servers.  The
application load is identical and results in near identical memory
allocation rate and young GC frequency.

*The difference is that on JBoss each young GC takes around 10-12ms whereas
on WildFly they take 16-20ms.*  The additional time is taken in the Object
Copy and Ref Proc phases.  This indicates that there are more Survivor
objects on each GC and more References being processed.  I'm pretty sure the
additional Survivor objects are also the Reference objects.  Running a Java
Mission Control (JMC) Flight Recording on these two systems under identical
production load, WildFly is showing around 11,000 PhantomReferences
processed each GC versus a handful on JBoss.  The number of
PhantomReferences seems to approximately correspond to the number of JMS
messages transmitted between GCs.

Some evidence


In a separate test environment I have WildFly running under a standard test
load.  It is emitting JMS messages at a consistent rate (lower than
production) onto a non-durable Topic.  The following image shows a 2 minute
JMC recording when there is no subscriber on the Topic.  We can see that
there are very few PhantomReferences and the GCs take around 4ms.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png> 

However once I start up an external JMS client subscribed to the Topic the
PhantomReferences appear and the GCs slow down as shown in the next image. 
The messaging is non-XA with auto-ack.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png> 

Where do the PhantomReferences come from?


I used the debugger to capture where the PhantomReferences are being
created.  This seems to indicate they are generated by Artemis and/or the
associated XNIO remoting...

Thread [Thread-1
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
(Suspended (breakpoint at line 80 in PhantomReference)) 
        Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 
80      
        Cleaner.<init>(Object, Runnable) line: 115      
        Cleaner.create(Object, Runnable) line: 133      
        DirectByteBuffer.<init>(int) line: 139  
        ByteBuffer.allocateDirect(int) line: 311        
        UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108       
        UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 
        UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, 
int)
line: 155       
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) 
line:
146     
        NettyServerConnection.createTransportBuffer(int) line: 38       

RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
line: 162       
        SessionReceiveMessage.encode(RemotingConnection) line: 59       
        ChannelImpl.send(Packet, boolean, boolean) line: 225    
        ChannelImpl.sendBatched(Packet) line: 205       
        CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) 
line:
84      
        ServerConsumerImpl.deliverStandardMessage(MessageReference, 
ServerMessage)
line: 883       
        ServerConsumerImpl.proceedDeliver(MessageReference) line: 366   
        QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358 
        QueueImpl.deliver() line: 1873  
        QueueImpl.access$1400(QueueImpl) line: 97       
        QueueImpl$DeliverRunner.run() line: 2581        
        OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100     
        ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142      
        ThreadPoolExecutor$Worker.run() line: 617       
        Thread.run() line: 745  
Thread [default I/O-5] (Suspended (breakpoint at line 80 in
PhantomReference))      
        Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 
80      
        Cleaner.<init>(Object, Runnable) line: 115      
        Cleaner.create(Object, Runnable) line: 133      
        DirectByteBuffer.<init>(int) line: 139  
        ByteBuffer.allocateDirect(int) line: 311        
        UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108       
        UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 
        UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, 
int)
line: 155       
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) 
line:
146     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 
107      
        AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104     

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426       
        AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371   
        ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) 
line:
92      
        ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 
        NioSocketConduit.handleReady(int) line: 88      
        WorkerThread.run() line: 559    
Thread [default I/O-4] (Suspended (breakpoint at line 80 in
PhantomReference))      
        Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 
80      
        Cleaner.<init>(Object, Runnable) line: 115      
        Cleaner.create(Object, Runnable) line: 133      
        DirectByteBuffer.<init>(int) line: 139  
        ByteBuffer.allocateDirect(int) line: 311        
        UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108       
        UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 
        UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, 
int)
line: 155       
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) 
line:
146     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 
107      
        AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104     

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426       
        AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371   
        ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) 
line:
92      
        ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 
        NioSocketConduit.handleReady(int) line: 88      
        WorkerThread.run() line: 559    
Thread [default I/O-2] (Suspended (breakpoint at line 80 in
PhantomReference))      
        Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 
80      
        Cleaner.<init>(Object, Runnable) line: 115      
        Cleaner.create(Object, Runnable) line: 133      
        DirectByteBuffer.<init>(int) line: 139  
        ByteBuffer.allocateDirect(int) line: 311        
        UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108       
        UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69 
        UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, 
int)
line: 155       
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) 
line:
146     
        UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 
107      
        AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104     

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426       
        AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371   
        ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) 
line:
92      
        ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66 
        NioSocketConduit.handleReady(int) line: 88      
        WorkerThread.run() line: 559
        
What's next?


So it's seems pretty certain that this is where the change is coming from. 
The total GC load is still low but we're aiming for consistent low latency
and this is a backwards step.

I'd be interested in any comments or questions.  I can provide more info if
required.

Thanks.



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Reply via email to