subject:"\[Puppet Users\] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side"

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-12 Thread Henrik Lindberg

On 2019-02-11 21:59, Mike Sharpton wrote:

Hello Henrik,

The heap being at 4GB is all the higher I would raise it, as you say GC 
becomes costly with big heaps.  The memory usage ramps up quite quickly 
to well above the configured max heap within minutes.  It comes up to 
about 5.8GB of usage quickly as we manage many resources on many nodes.  
We do not have many environments.  We normally have a production branch 
and only use a preprod branch to move changes up our environment with a 
module that managed the puppet.conf on our nodes.  We will keep looking, 
but I don't see a smoking gun.  Anyone else have any ideas?  Puppet 4 
was able to handle this load with only 4 JRuby workers and 4 Puppet 
servers.  Thanks for your help,

Mike

I suppose you have read this: 
https://puppet.com/docs/puppetserver/6.0/tuning_guide.html

Read your description again. Think it is of value to look at the 
stacktrace you get when the timeout occur. This to figure out what it is

that is timing out.

Best,
- henrik

On Monday, February 11, 2019 at 2:06:11 PM UTC-6, Henrik Lindberg wrote:

On 2019-02-11 14:42, Mike Sharpton wrote:
 > Hey all,
 >
 > We have recently upgraded our environment from Puppetserver 4.2.2 to
 > Puppetserver 6.0.2.  We are running a mix of Puppet 4 and Puppet 6
 > agents until we can get them all upgraded to 6.  We have around 6000
 > nodes, and we had 4 Puppetservers, but we added two more due to
capacity
 > issues with Puppet 6.  The load is MUCH higher with Puppet 6.  To
the
 > question, I am seeing longer and longer agent run times after
about two
 > days of the services running.  The only error in the logs that
seems to
 > have any relation to this is this string.
 >
 > 2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core]
 > Internal Server Error: java.io.IOException:
 > java.util.concurrent.TimeoutException: Idle timeout expired:
30001/3 ms
 >
 >
 > After I restart the puppetserver service, this goes away for
about two
 > days.  I think Puppetserver is dying a slow death under this load
(load
 > average of around 5-6).  We are running Puppetserver on vm's that
are
 > 10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap.  I
 > have not seen any OOM exceptions and the process never crashes.  Has
 > anyone else seen anything like this?  I did some Googling and didn't
 > find a ton of relevant stuff.  Perhaps we need to upgrade to the
latest
 > version to see if this helps?  Even more capacity?  Seems silly. 
Thanks

 > in advance!
 >

There may be a slow memory leak that over time makes the server busy
win
non productive work (scanning for garbage on an ever increasing heap).
If you were to increase capacity you would risk only changing the 2
days
to a couple more, but not actually solving the issue.

Try to look at server memory usage over the two days.

Also, naturally, upgrade to latest and make sure modules are updated as
well.

Do you by any chance have many environments with different versions of
ruby code? The environment isolation "puppet generate types" may be of
help if that is the case as loaded ruby resource types become sticky in
memory.

- henrik

 > Mike
 >
 > --
 > You received this message because you are subscribed to the Google
 > Groups "Puppet Users" group.
 > To unsubscribe from this group and stop receiving emails from it,
send
 > an email to puppet-users...@googlegroups.com 
 > .
 > To view this discussion on the web visit
 >

https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com

 >

>.

 > For more options, visit https://groups.google.com/d/optout
.

-- 

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google 
Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to puppet-users+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/59932f88-0416-45d3-b938-0f3c417051fa%40googlegroups.com

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-12 Thread Mike Sharpton

Hello Justin,

We were thinking the same thing with the JRuby workers.  Perhaps lowering 
them back to 4, and lowering the heap size back to three, which worked fine 
before now that we have added 2 more Puppet servers.  The behavior we see 
is failing Puppet runs like this on random modules.  

Could not evaluate: Could not retrieve file metadata for 
puppet:///modules/modulename/resourcename: SSL_connect returned=6 errno=0 
state=unknown state

Something took far too long to answer is our guess.  Reports are fine, 
PuppetDB is fine.  Things always make it there.  We see the failures.  It 
is likely we have herd that comes in and sometimes makes the situation 
worse.  Based on this fact, this should happen every 30 minutes.  It 
doesn't.  I can't think of any other server settings we are managing 
besides JRuby instances, heap size, and a tmp dir for Java to work with  
(/tmp is noexec here).   We are using this JVM, and we do not have any 
custom tuning.

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

We will try and mess with the JRuby/heap ratio now that we have more 
Puppetservers.  We consistently see all JRuby instances being utilized even 
when being set a 6.  Another thing we may consider is doing Puppet runs 
every 45 mins instead of 30.  This will lower load as well.  Thanks for 
your thoughts,

Mike


On Monday, February 11, 2019 at 5:21:06 PM UTC-6, Justin Stoller wrote:
>
>
>
> On Mon, Feb 11, 2019 at 5:42 AM Mike Sharpton  > wrote:
>
>> Hey all,
>>
>> We have recently upgraded our environment from Puppetserver 4.2.2 to 
>> Puppetserver 6.0.2.  We are running a mix of Puppet 4 and Puppet 6 agents 
>> until we can get them all upgraded to 6.  We have around 6000 nodes, and we 
>> had 4 Puppetservers, but we added two more due to capacity issues with 
>> Puppet 6.  The load is MUCH higher with Puppet 6.  To the question, I am 
>> seeing longer and longer agent run times after about two days of the 
>> services running.  The only error in the logs that seems to have any 
>> relation to this is this string.
>>
>> 2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core] 
>> Internal Server Error: java.io.IOException: 
>> java.util.concurrent.TimeoutException: Idle timeout expired: 30001/3 ms
>>
>>
>> After I restart the puppetserver service, this goes away for about two 
>> days.  I think Puppetserver is dying a slow death under this load (load 
>> average of around 5-6).  We are running Puppetserver on vm's that are 
>> 10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap.  I have 
>> not seen any OOM exceptions and the process never crashes.  Has anyone else 
>> seen anything like this?  I did some Googling and didn't find a ton of 
>> relevant stuff.  Perhaps we need to upgrade to the latest version to see if 
>> this helps?  Even more capacity?  Seems silly.  Thanks in advance!
>>
>
> Off the top of my head:
> 1. Have you tried lowering the JRuby workers to JVM heap ratio? (I would 
> try 1G to 1worker to see if it really is worker performance)
> 2. That error is most likely from Jetty (it can be tuned with 
> idle-timeout-milliseconds[1]). Are agent runs failing with a 500 from the 
> server when that happens? Are clients failing to post their facts or 
> reports in a timely manner? Is Puppet Server failing its connections to 
> PuppetDB?
> 3. Are you managing any other server settings? Having a low 
> max-requests-per-instance is problematic for newer servers (they more 
> aggressively compile/optimize the Ruby code the worker loads, so with 
> shorter lifetimes it does a bunch of work to then throw it a way and start 
> over - and that can cause much more load).
> 4. What version of java are you using/do you have any custom tuning of 
> Java that maybe doesn't work well with newer servers? Server 5+ only has 
> support for Java 8 and will use more non-heap memory/code cache for those 
> new optimizations mentioned above.
>
> HTH,
> Justin
>
>
> 1. 
> https://github.com/puppetlabs/trapperkeeper-webserver-jetty9/blob/master/doc/jetty-config.md#idle-timeout-milliseconds
>
>
>> Mike
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Puppet Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to puppet-users...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsub

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-11 Thread Justin Stoller

On Mon, Feb 11, 2019 at 5:42 AM Mike Sharpton  wrote:

> Hey all,
>
> We have recently upgraded our environment from Puppetserver 4.2.2 to
> Puppetserver 6.0.2.  We are running a mix of Puppet 4 and Puppet 6 agents
> until we can get them all upgraded to 6.  We have around 6000 nodes, and we
> had 4 Puppetservers, but we added two more due to capacity issues with
> Puppet 6.  The load is MUCH higher with Puppet 6.  To the question, I am
> seeing longer and longer agent run times after about two days of the
> services running.  The only error in the logs that seems to have any
> relation to this is this string.
>
> 2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core]
> Internal Server Error: java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 30001/3 ms
>
>
> After I restart the puppetserver service, this goes away for about two
> days.  I think Puppetserver is dying a slow death under this load (load
> average of around 5-6).  We are running Puppetserver on vm's that are
> 10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap.  I have
> not seen any OOM exceptions and the process never crashes.  Has anyone else
> seen anything like this?  I did some Googling and didn't find a ton of
> relevant stuff.  Perhaps we need to upgrade to the latest version to see if
> this helps?  Even more capacity?  Seems silly.  Thanks in advance!
>

Off the top of my head:
1. Have you tried lowering the JRuby workers to JVM heap ratio? (I would
try 1G to 1worker to see if it really is worker performance)
2. That error is most likely from Jetty (it can be tuned with
idle-timeout-milliseconds[1]). Are agent runs failing with a 500 from the
server when that happens? Are clients failing to post their facts or
reports in a timely manner? Is Puppet Server failing its connections to
PuppetDB?
3. Are you managing any other server settings? Having a low
max-requests-per-instance is problematic for newer servers (they more
aggressively compile/optimize the Ruby code the worker loads, so with
shorter lifetimes it does a bunch of work to then throw it a way and start
over - and that can cause much more load).
4. What version of java are you using/do you have any custom tuning of Java
that maybe doesn't work well with newer servers? Server 5+ only has support
for Java 8 and will use more non-heap memory/code cache for those new
optimizations mentioned above.

HTH,
Justin

1.
https://github.com/puppetlabs/trapperkeeper-webserver-jetty9/blob/master/doc/jetty-config.md#idle-timeout-milliseconds

> Mike
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/CA%2B%3DBEqXhSaod%2BkJHx23YpPVd3DMc8gSofvU2D6bbv%3Dt4%3DJKDxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-11 Thread Mike Sharpton

Hello Henrik,

The heap being at 4GB is all the higher I would raise it, as you say GC 
becomes costly with big heaps.  The memory usage ramps up quite quickly to 
well above the configured max heap within minutes.  It comes up to about 
5.8GB of usage quickly as we manage many resources on many nodes.  We do 
not have many environments.  We normally have a production branch and only 
use a preprod branch to move changes up our environment with a module that 
managed the puppet.conf on our nodes.  We will keep looking, but I don't 
see a smoking gun.  Anyone else have any ideas?  Puppet 4 was able to 
handle this load with only 4 JRuby workers and 4 Puppet servers.  Thanks 
for your help,

Mike

On Monday, February 11, 2019 at 2:06:11 PM UTC-6, Henrik Lindberg wrote:
>
> On 2019-02-11 14:42, Mike Sharpton wrote: 
> > Hey all, 
> > 
> > We have recently upgraded our environment from Puppetserver 4.2.2 to 
> > Puppetserver 6.0.2.  We are running a mix of Puppet 4 and Puppet 6 
> > agents until we can get them all upgraded to 6.  We have around 6000 
> > nodes, and we had 4 Puppetservers, but we added two more due to capacity 
> > issues with Puppet 6.  The load is MUCH higher with Puppet 6.  To the 
> > question, I am seeing longer and longer agent run times after about two 
> > days of the services running.  The only error in the logs that seems to 
> > have any relation to this is this string. 
> > 
> > 2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core] 
> > Internal Server Error: java.io.IOException: 
> > java.util.concurrent.TimeoutException: Idle timeout expired: 30001/3 
> ms 
> > 
> > 
> > After I restart the puppetserver service, this goes away for about two 
> > days.  I think Puppetserver is dying a slow death under this load (load 
> > average of around 5-6).  We are running Puppetserver on vm's that are 
> > 10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap.  I 
> > have not seen any OOM exceptions and the process never crashes.  Has 
> > anyone else seen anything like this?  I did some Googling and didn't 
> > find a ton of relevant stuff.  Perhaps we need to upgrade to the latest 
> > version to see if this helps?  Even more capacity?  Seems silly.  Thanks 
> > in advance! 
> > 
>
> There may be a slow memory leak that over time makes the server busy win 
> non productive work (scanning for garbage on an ever increasing heap). 
> If you were to increase capacity you would risk only changing the 2 days 
> to a couple more, but not actually solving the issue. 
>
> Try to look at server memory usage over the two days. 
>
> Also, naturally, upgrade to latest and make sure modules are updated as 
> well. 
>
> Do you by any chance have many environments with different versions of 
> ruby code? The environment isolation "puppet generate types" may be of 
> help if that is the case as loaded ruby resource types become sticky in 
> memory. 
>
> - henrik 
>
> > Mike 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "Puppet Users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to puppet-users...@googlegroups.com  
> > . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
>
> -- 
>
> Visit my Blog "Puppet on the Edge" 
> http://puppet-on-the-edge.blogspot.se/ 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/59932f88-0416-45d3-b938-0f3c417051fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-11 Thread Henrik Lindberg

On 2019-02-11 14:42, Mike Sharpton wrote:

Hey all,

We have recently upgraded our environment from Puppetserver 4.2.2 to
Puppetserver 6.0.2. We are running a mix of Puppet 4 and Puppet 6
agents until we can get them all upgraded to 6. We have around 6000
nodes, and we had 4 Puppetservers, but we added two more due to capacity
issues with Puppet 6. The load is MUCH higher with Puppet 6. To the
question, I am seeing longer and longer agent run times after about two
days of the services running. The only error in the logs that seems to
have any relation to this is this string.

2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core]
Internal Server Error: java.io.IOException:
java.util.concurrent.TimeoutException: Idle timeout expired: 30001/3 ms

After I restart the puppetserver service, this goes away for about two
days. I think Puppetserver is dying a slow death under this load (load
average of around 5-6). We are running Puppetserver on vm's that are
10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap. I
have not seen any OOM exceptions and the process never crashes. Has
anyone else seen anything like this? I did some Googling and didn't
find a ton of relevant stuff. Perhaps we need to upgrade to the latest
version to see if this helps? Even more capacity? Seems silly. Thanks
in advance!

There may be a slow memory leak that over time makes the server busy win
non productive work (scanning for garbage on an ever increasing heap).
If you were to increase capacity you would risk only changing the 2 days
to a couple more, but not actually solving the issue.

Try to look at server memory usage over the two days.

Also, naturally, upgrade to latest and make sure modules are updated as
well.

Do you by any chance have many environments with different versions of
ruby code? The environment isolation "puppet generate types" may be of
help if that is the case as loaded ruby resource types become sticky in
memory.

- henrik

Mike

--
You received this message because you are subscribed to the Google
Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to puppet-users+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com
.

For more options, visit https://groups.google.com/d/optout.

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/q3shrm%241nb9%241%40blaine.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

2019-02-11 Thread Mike Sharpton

Hey all,

We have recently upgraded our environment from Puppetserver 4.2.2 to
Puppetserver 6.0.2. We are running a mix of Puppet 4 and Puppet 6 agents
until we can get them all upgraded to 6. We have around 6000 nodes, and we
had 4 Puppetservers, but we added two more due to capacity issues with
Puppet 6. The load is MUCH higher with Puppet 6. To the question, I am
seeing longer and longer agent run times after about two days of the
services running. The only error in the logs that seems to have any
relation to this is this string.

2019-02-11T04:32:28.409-06:00 ERROR [qtp1148783071-4075] [p.r.core]
Internal Server Error: java.io.IOException:
java.util.concurrent.TimeoutException: Idle timeout expired: 30001/3 ms

After I restart the puppetserver service, this goes away for about two
days. I think Puppetserver is dying a slow death under this load (load
average of around 5-6). We are running Puppetserver on vm's that are
10X8GB and using 6 Jruby workers per Puppetserver and a 4GB heap. I have
not seen any OOM exceptions and the process never crashes. Has anyone else
seen anything like this? I did some Googling and didn't find a ton of
relevant stuff. Perhaps we need to upgrade to the latest version to see if
this helps? Even more capacity? Seems silly. Thanks in advance!

Mike

--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/197c0ad5-83c0-4562-833b-82028f0e3e9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

Re: [Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

[Puppet Users] Puppetserver 6.0.2 timeouts in the puppetserver log and on the agent side

6 matches

Site Navigation

Mail list logo

Footer information