Re: Performance issue 8.5.20 (metaspace related?)

2017-08-31 Thread Ing. Andrea Vettori
> On 31 Aug 2017, at 17:55, Christopher Schultz <ch...@christopherschultz.net> 
> wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Andrea,
> 
> On 8/30/17 4:13 PM, Ing. Andrea Vettori wrote:
>>> On 30 Aug 2017, at 00:16, Christopher Schultz
>>> <ch...@christopherschultz.net> wrote: RMI is known for flagrantly
>>> wasting permgen/metaspace because of all the Proxy objects and
>>> special throw-away Class objects it uses, etc. But I'm not sure
>>> why the metaspace filling up would have such a dramatic effect on
>>> performance. If you were stalling during a metaspace collection,
>>> it might make sense, but that's not what you are reporting.
>> 
>> That’s true in fact what I’d like to understand is if a big
>> metaspace can affect performance… one thing I want to try is to run
>> the app with metaspace capped at 400/500MB. On java 7 I had PermGen
>> at 512 and as far as I remember, it was working fine.
> 
> Yes, I think limiting the metaspace might help. It may result in more
> frequent, smaller GC cycles.
> 
> I actually don't know much about the (relatively) new metaspace
> region, but it's possible that GC in the metaspace is a "stop the
> world" operation where the JVM can't do anything else during that time.

I tried that this morning. I set one tomcat with max metaspace set to 512mb, 
left the other one with 2gb. The one with 512mb was faster to call jboss on 
average but triggered a lot more the gc so some calls where very slow (the ones 
when the gc triggered).

If you’ve seen my other reply the performance problem has been fixed but the 
curiosity to understand why big metaspace is slow remains…

Thanks


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-31 Thread Ing. Andrea Vettori
> On 29 Aug 2017, at 14:24, Mark Thomas <ma...@apache.org> wrote:
> 
> On 29/08/17 13:09, Ing. Andrea Vettori wrote:
>>> On 29 Aug 2017, at 12:29, Suvendu Sekhar Mondal <suv3...@gmail.com> wrote:
>>> 
>>> On Tue, Aug 29, 2017 at 2:54 PM, Ing. Andrea Vettori
>>> <a.vett...@b2bires.com> wrote:
>>>> - with a fresh started tomcat instance, the time it takes is around 0,8 
>>>> seconds. Most of the time is spent on the two RMI calls the task does.
>>>> - with an instance that is running from some time, the time can reach 2/3 
>>>> seconds; occasionally 5/6 seconds. Most time is still spent on RMI calls. 
>>>> I.e. what slows down are the RMI calls.
>>> 
>>> Sill question, do you mean RMI calls generating from Tomcat is getting
>>> slower with time? or, JBoss is taking time to return the response?
>> 
>> Thanks for your help.
>> What I see is that the http requests are slower. Looking at the details of 
>> that specific request (putting a product in the cart) using System.nanotime 
>> I can see that most of the time is spent during the RMI calls. I’m pretty 
>> sure it’s not jboss that is slower because doing other calls at the same 
>> time with fresh tomcats or with a java client they're not slow.
> 
> I'd try profiling the problematic code. Given the magnitude of the
> times, sampling should be sufficient which means you could do this on
> production when the problem happens. You'll probably need to configure
> it to log JVM internal calls.


Profiling the code I have been able to find the cause of the big metaspace 
garbage…
Due to a bug, we were not caching remote interfaces when connecting to jboss 
from the web sites. Other client kind was ok.
Fixed this problem today, it’s a few hours that the system is running fine. 
It’s fast (faster than before even with a fresh tomcat); metaspace says low and 
everything is fine.

This solves my problem. Thanks to everybody that helped with comments and 
suggestions.

Still ’m very curios to know why big metaspace causes such increasing slowness… 
looking at the GC logs it seems not a gc problem… maybe metaspace handling by 
the jvm is not optimal when it’s big ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-31 Thread Ing. Andrea Vettori
> On 31 Aug 2017, at 15:24, Suvendu Sekhar Mondal  wrote:
> 
> I will suggest that if you have some budget, please get a decent APM
> like AppDynamics, New Relic, Dynatrace to monitor your prod system
> 24x7x365. Trust me, you will be able to identify and solve this type
> of sporadic slowness issue very quickly.
> 


Thank you, I’m using ‘perfino’ since a couple days and it seems very good.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-30 Thread Ing. Andrea Vettori

> On 30 Aug 2017, at 16:38, Suvendu Sekhar Mondal  wrote:
> 
> 
> In Hotspot VM, Full GC will collect "dead" objects/classes from all
> the regions(Young, Old, Metaspace/PermGen). In the provided GC log, I
> can see that you had triggered Full GCs but Metaspace did not shrunk
> much. That clearly indicates you had good number of classes that were
> "in use":
> 
> 48318.001: [Full GC (Heap Dump Initiated GC)  5232M->1298M(20G), 4.9784080 
> secs]
>   [Eden: 2800.0M(3136.0M)->0.0B(9616.0M) Survivors: 120.0M->0.0B
> Heap: 5232.7M(20.0G)->1298.9M(20.0G)], [Metaspace:
> 1419958K->1398648K(2621440K)]
> [Times: user=9.55 sys=0.09, real=4.98 secs]
> 48404.270: [Full GC (System.gc())  2825M->1301M(20G), 4.3863751 secs]
>   [Eden: 1528.0M(9616.0M)->0.0B(9616.0M) Survivors: 0.0B->0.0B Heap:
> 2825.2M(20.0G)->1301.6M(20.0G)], [Metaspace:
> 1425628K->1425623K(2627584K)]
> [Times: user=9.09 sys=0.15, real=4.38 secs]
> 
> Sometimes Full GC were able to clean up some space from Metaspace but
> only as part of a final last ditch collection effort:
> 
> 43618.504: [Full GC (Last ditch collection)  1386M->250M(20G), 1.6455823 secs]
>   [Eden: 0.0B(6408.0M)->0.0B(6408.0M) Survivors: 0.0B->0.0B Heap:
> 1386.7M(20.0G)->250.5M(20.0G)], [Metaspace:
> 1646471K->163253K(1843200K)]
> [Times: user=2.23 sys=0.10, real=1.65 secs]
> 49034.392: [Full GC (Last ditch collection)  1491M->347M(20G), 1.9965534 secs]
>   [Eden: 0.0B(5600.0M)->0.0B(5600.0M) Survivors: 0.0B->0.0B Heap:
> 1491.1M(20.0G)->347.9M(20.0G)], [Metaspace:
> 1660804K->156199K(2031616K)]
> [Times: user=2.78 sys=0.10, real=2.00 secs]

This is interesting because I see this pattern for metaspace gc… 

https://ibb.co/b2B9HQ

there’s something that clears it but it seems it’s not the full gc. I don’t 
think there’s something that happens in the app that causes that much metaspace 
 space to become instantly dead. So I think the full gc is not doing the 
metaspace cleaning the way it could do it. Maybe it’s a light clean…


 Tomcat is running with these VM options:
 
 -Xms20480m -Xmx20480m -Dsun.rmi.dgc.client.gcInterval=360 
 -Dsun.rmi.dgc.server.gcInterval=360 -Xloggc:/tmp/gc.txt 
 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false -XX:+PrintGCDetails 
 -XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:ReservedCodeCacheSize=1g 
 -XX:InitialCodeCacheSize=256m -XX:+UseHugeTLBFS -XX:MetaspaceSize=1g 
 -XX:MaxMetaspaceSize=2g
 
 Some of the options have been recently added (for example the increase in 
 code cache  size) but it seems they had no impact on the issue.
 
>>> 
>>> Is there any specific reason to trigger Full GC every
>>> 60min(-Dsun.rmi.dgc.client.gcInterval=360)? I think that's
>>> overkill.
>> 
>> That’s a value I saw “everywhere” as suggested value (the default is too 
>> short). Note that on the GC logs I don’t see full GCs every hour so maybe 
>> that option is not triggering full gc ?
> 
> That's is the default value. Please see:
> http://docs.oracle.com/javase/8/docs/technotes/guides/rmi/sunrmiproperties.html
>  
> 

They must have changed it recently, it was 180k when I first checked that 
option. Do you think it’s worth to increase it ? I don’t see a full gc every 
hour…

Thanks



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-30 Thread Ing. Andrea Vettori

> On 30 Aug 2017, at 01:13, Caldarale, Charles R  
> wrote:
> 
> Is it possible that the system is getting into swapping?  The heap has been
> set to 20 GiB, but I didn't see any mention of how much actual memory the
> system has.  Do you really need a 20 GiB heap?  Sometimes smaller is better.
> Might also want to try turning off UseHugeTLBFS.

No swap is used on the server; it has 32GB of ram. I think the heap usage is at 
most 15/16GB… in the past I tried various heap sizes and I think this size 
works well.

> 
> I wonder if there are heap objects tied up due to dead but not collected
> metaspace items; when metaspace is GC'd, the heap usage might well shrink
> also.  (This might be a G1GC bug, but that seems unlikely.)  Perhaps the RMI
> mechanism (which I haven't looked at in many years) is searching an
> ever-growing set of soft/weak references that are removed when metaspace is
> collected.  A smaller heap might help there.

Thanks for your suggestions
-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-30 Thread Ing. Andrea Vettori

> On 29 Aug 2017, at 14:24, Mark Thomas <ma...@apache.org> wrote:
> 
> On 29/08/17 13:09, Ing. Andrea Vettori wrote:
>>> On 29 Aug 2017, at 12:29, Suvendu Sekhar Mondal <suv3...@gmail.com> wrote:
>>> 
>>> On Tue, Aug 29, 2017 at 2:54 PM, Ing. Andrea Vettori
>>> <a.vett...@b2bires.com> wrote:
>>>> - with a fresh started tomcat instance, the time it takes is around 0,8 
>>>> seconds. Most of the time is spent on the two RMI calls the task does.
>>>> - with an instance that is running from some time, the time can reach 2/3 
>>>> seconds; occasionally 5/6 seconds. Most time is still spent on RMI calls. 
>>>> I.e. what slows down are the RMI calls.
>>> 
>>> Sill question, do you mean RMI calls generating from Tomcat is getting
>>> slower with time? or, JBoss is taking time to return the response?
>> 
>> Thanks for your help.
>> What I see is that the http requests are slower. Looking at the details of 
>> that specific request (putting a product in the cart) using System.nanotime 
>> I can see that most of the time is spent during the RMI calls. I’m pretty 
>> sure it’s not jboss that is slower because doing other calls at the same 
>> time with fresh tomcats or with a java client they're not slow.
> 
> I'd try profiling the problematic code. Given the magnitude of the
> times, sampling should be sufficient which means you could do this on
> production when the problem happens. You'll probably need to configure
> it to log JVM internal calls.
> 
> I use YourKit (because they give me a free license to use to develop
> Tomcat). Other profilers are available.
> 
> Mark


Thank you, I’m going to try this soon and report back what I find.
-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-30 Thread Ing. Andrea Vettori
> On 30 Aug 2017, at 00:16, Christopher Schultz  
> wrote:
> 
>>> 
>>> Another thing, is there any way you can measure the processing
>>> time in JBoss for each RMI calls?
>>> 
>> 
>> At the moment I’m not measuring the time on the jboss side but I’m
>> considering using a telemetry product in the future. I’m measuring
>> the time on the client side (tomcat or fat client).
> 
> I think it's critical to get metrics on the JBoss side, otherwise you
> don't even know where the problem might be. It's good information to
> have that the application running within Tomcat performs well
> immediately following a metaspace collection

this and the fact that other “clients” are not slow

> , but it's not enough
> information to inform a good decision.
> 
> RMI is known for flagrantly wasting permgen/metaspace because of all
> the Proxy objects and special throw-away Class objects it uses, etc.
> But I'm not sure why the metaspace filling up would have such a
> dramatic effect on performance. If you were stalling during a
> metaspace collection, it might make sense, but that's not what you are
> reporting.

That’s true in fact what I’d like to understand is if a big metaspace can 
affect performance… one thing I want to try is to run the app with metaspace 
capped at 400/500MB. On java 7 I had PermGen at 512 and as far as I remember, 
it was working fine.

> It's like saying that HTTP traffic slows down on your PC
> until you empty your Recycle Bin.
> 
> At any rate, this is not a problem with Tomcat itself: this is likely
> entirely JVM-related. I'm sure everyone is happy to discuss it, here
> (on a Tomcat-specific mailing list), but there aren't any
> Tomcat-related settings that you will be able to tweak that affect
> your environment in any way.


Thank you for your suggestions!
-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Performance issue 8.5.20 (metaspace related?)

2017-08-29 Thread Ing. Andrea Vettori
> On 29 Aug 2017, at 12:29, Suvendu Sekhar Mondal <suv3...@gmail.com> wrote:
> 
> On Tue, Aug 29, 2017 at 2:54 PM, Ing. Andrea Vettori
> <a.vett...@b2bires.com> wrote:
>> - with a fresh started tomcat instance, the time it takes is around 0,8 
>> seconds. Most of the time is spent on the two RMI calls the task does.
>> - with an instance that is running from some time, the time can reach 2/3 
>> seconds; occasionally 5/6 seconds. Most time is still spent on RMI calls. 
>> I.e. what slows down are the RMI calls.
> 
> Sill question, do you mean RMI calls generating from Tomcat is getting
> slower with time? or, JBoss is taking time to return the response?

Thanks for your help.
What I see is that the http requests are slower. Looking at the details of that 
specific request (putting a product in the cart) using System.nanotime I can 
see that most of the time is spent during the RMI calls. I’m pretty sure it’s 
not jboss that is slower because doing other calls at the same time with fresh 
tomcats or with a java client they're not slow.

> 
>> - restarting the jvm fixes the issue
>> - ***it seems*** but I’m still testing this since it seems there’s no 
>> ‘meatspace gc trigger command available', that when Metaspace is garbage 
>> collected, tomcat then performs like a fresh instance.
>> 
> 
> Yes, no command to only garbage collect Metaspace area. Without
> recycling Tomcat, you can trigger a Full GC either using VisualVM or
> jcmd's GC.run command.

I think that explicit GC is not doing a metaspace GC… that’s what I observed 
starting the GC with the java console.

> 
>> Since we’re using more than one tomcat instance (2 in production for this 
>> website, 1 for development) I can see that the issue is isolated to Tomcat 
>> or the JVM/Host where it runs because other Tomcat instances behave well at 
>> the same time one is slow. The same JBoss/Postgres backend is used by java 
>> batches and fat clients and it does work well with consistent times.
>> 
>> To clarify: the moment one production tomcat that is running from some time 
>> finishes the task in 3 seconds, the development tomcat or a fresh started 
>> production tomcat instance does the same task in less that one second. Note 
>> that repeating the task gives always consistent results, i.e. the instance 
>> is running from some time is always slow,  the fresh running instance is 
>> always fast.
>> 
>> Tomcat is running with these VM options:
>> 
>> -Xms20480m -Xmx20480m -Dsun.rmi.dgc.client.gcInterval=360 
>> -Dsun.rmi.dgc.server.gcInterval=360 -Xloggc:/tmp/gc.txt 
>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 
>> -Dcom.sun.management.jmxremote.ssl=false 
>> -Dcom.sun.management.jmxremote.authenticate=false -XX:+PrintGCDetails 
>> -XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:ReservedCodeCacheSize=1g 
>> -XX:InitialCodeCacheSize=256m -XX:+UseHugeTLBFS -XX:MetaspaceSize=1g 
>> -XX:MaxMetaspaceSize=2g
>> 
>> Some of the options have been recently added (for example the increase in 
>> code cache  size) but it seems they had no impact on the issue.
>> 
> 
> Is there any specific reason to trigger Full GC every
> 60min(-Dsun.rmi.dgc.client.gcInterval=360)? I think that's
> overkill.

That’s a value I saw “everywhere” as suggested value (the default is too 
short). Note that on the GC logs I don’t see full GCs every hour so maybe that 
option is not triggering full gc ?


>> Metaspace goes up to 1,6GB before being collected. Value after garbage 
>> collect is around 200MB. Heap usage is variable, it usually stays under 10G 
>> and is around 1G after garbage collect.
>> CPU usage rarely goes over 10%. Loaded classes between 20k and 40k. Active 
>> sessions around 100/120 for each instance.
>> 
> 
> That sounds like a healthy behavior to me. That means GC is occurring
> and Metaspace is getting cleaned up. Why do you think Metaspace is the
> problem?

Because I can observe that when the metaspace is collected the requests become 
fast. I observer that a few hours ago, looking at the metaspace graph of the 
java console and doing requests just after the collect. 

> From the given data, my first impression is JVM is not having any
> memory pressure. If you can post your gc.txt file to this group, I
> would be happy to check it.

Thanks, here’s a link for gceasy report: 
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDgvMjkvLS1nY3N0YXRzLnR4dC5nei0tMTEtNDktMzQ=
 
<http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDgvMjkvLS1nY3N0YXRzLnR4dC5nei0tMTEtNDktMzQ=>
and the txt log file: http://ge.tt/1Un0DLm2
Please consider that I triggered a Full GC manually two times on this server.


> 
> Another thing, is there any way you can measure the processing time in
> JBoss for each RMI calls?
> 

At the moment I’m not measuring the time on the jboss side but I’m considering 
using a telemetry product in the future. I’m measuring the time on the client 
side (tomcat or fat client).

Thank you!
—
Andrea Vettori



Performance issue 8.5.20 (metaspace related?)

2017-08-29 Thread Ing. Andrea Vettori
Hello,
W're running several instances of Tomcat 8.5.20 / JDK8.144 / CentOs7 on our 
company for various web sites in many hosts. Recently I’m trying to understand 
a performance problem we’re having on our e-commerce web site.
The configuration is the following

HAProxy   <—> 2x Tomcat 8.5.20 <—>  JBoss 5.1 EJB <—> Postgres 9.6

Tomcat runs a web site built with Struts / Freemarker that does call JBoss EJBs 
with RMI.
Monitoring a specific task (putting a product on the cart) I see the following :

- with a fresh started tomcat instance, the time it takes is around 0,8 
seconds. Most of the time is spent on the two RMI calls the task does.
- with an instance that is running from some time, the time can reach 2/3 
seconds; occasionally 5/6 seconds. Most time is still spent on RMI calls. I.e. 
what slows down are the RMI calls.
- restarting the jvm fixes the issue
- ***it seems*** but I’m still testing this since it seems there’s no 
‘meatspace gc trigger command available', that when Metaspace is garbage 
collected, tomcat then performs like a fresh instance.

Since we’re using more than one tomcat instance (2 in production for this 
website, 1 for development) I can see that the issue is isolated to Tomcat or 
the JVM/Host where it runs because other Tomcat instances behave well at the 
same time one is slow. The same JBoss/Postgres backend is used by java batches 
and fat clients and it does work well with consistent times.

To clarify: the moment one production tomcat that is running from some time 
finishes the task in 3 seconds, the development tomcat or a fresh started 
production tomcat instance does the same task in less that one second. Note 
that repeating the task gives always consistent results, i.e. the instance is 
running from some time is always slow,  the fresh running instance is always 
fast.

Tomcat is running with these VM options: 

-Xms20480m -Xmx20480m -Dsun.rmi.dgc.client.gcInterval=360 
-Dsun.rmi.dgc.server.gcInterval=360 -Xloggc:/tmp/gc.txt 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:ReservedCodeCacheSize=1g 
-XX:InitialCodeCacheSize=256m -XX:+UseHugeTLBFS -XX:MetaspaceSize=1g 
-XX:MaxMetaspaceSize=2g

Some of the options have been recently added (for example the increase in code 
cache  size) but it seems they had no impact on the issue.

Metaspace goes up to 1,6GB before being collected. Value after garbage collect 
is around 200MB. Heap usage is variable, it usually stays under 10G and is 
around 1G after garbage collect.
CPU usage rarely goes over 10%. Loaded classes between 20k and 40k. Active 
sessions around 100/120 for each instance.

Any help or direction to understand what’s causing this is greatly appreciated.
Thank you
—
Ing. Andrea Vettori