Re: [Puppet Users] Re: Puppetserver performance plummeting a few hours after startup

Josh Cooper Fri, 07 Feb 2020 10:55:39 -0800

On Fri, Feb 7, 2020 at 2:11 AM Yvan Broccard <yvan.brocc...@gmail.com>
wrote:


> Hi,
> About the growing state.yaml :
> I have upgraded to version 6.7 in the mean time. I've seen in the code
> that the issue with :statettl was fixed.
> As the default setting is 32days, I still need to wait a couple of days
> before being sure it works, as I still had a "tidy" statement for cleanup
> up Puppet reports until recently.
> I still have a huge state.yaml file, containing 200'000 lines, and
> referencing 60'000 reports that have already been removed, but they are
> only reports generated during the last month, with the oldest entries with
> a ":checked:" date of  2020-01-06
>

That's odd. Are those yaml reports that puppetserver received and stored
locally due to the default puppet setting "reports=store"? Puppetserver
should be storing reports in
/opt/puppetlabs/server/data/puppetserver/reports/<client>/*yaml, so those
should not appear in state.yaml. Can you confirm whether puppetserver is
loading/updating state.yaml or only the local puppet agent running on the
server. For example to confirm puppetserver isn't accessing that file:

$ sudo strace -p <puppetserver java pid> -P
/opt/puppetlabs/puppet/cache/state/state.yaml

Only the agent should ever access the state file and I wouldn't expect
reports to be in there. But it's possible a puppetserver extension
(function, report processor, hiera backend, ...) could be calling
Puppet::Util::Storage.load/store erroneously.


> ... So I guess it works !
>
> Thank you for clarifying this issue !
>
> Yvan
>
> On Thu, Feb 6, 2020 at 6:04 PM Justin Stoller <jus...@puppet.com> wrote:
>
>> Yvan your issue sounds like
>> https://tickets.puppetlabs.com/browse/PUP-3647, do you know if that is
>> fixed now, or has regressed since then?
>>
>> Your issue does sound like a CodeCache or Metaspace issue.
>>
>> One tunable you didn't mention was "max-active-instances" I've found a
>> bunch of folks that turned that very low to combat leaky code in 5.x or
>> 4.x, despite it causing Puppet & the ruby runtime to be reloaded
>> frequently. In 6.x that loading became much more expensive so small values
>> of "max-active-instances" can be very detrimental to performance (and
>> contribute to excessive Metaspace/CodeCache usage).
>>
>> This is also assuming that your servers are both 6.x and both at the same
>> version. Can you confirm that? There are recent improvements in Server
>> performance that could contribute (though probably not completely explain)
>> the difference in performance your seeing if your new Server is the latest
>> version and your old server hasn't been upgraded in a few months.
>>
>> HTH,
>> Justin
>>
>>
>>
>> On Thu, Feb 6, 2020 at 8:43 AM KevinR <kevin.reeuw...@puppet.com> wrote:
>>
>>> Hi Martijn,
>>>
>>> it sounds like you have a sub-optimal combination of:
>>>
>>>    - The amount of JRubies
>>>    - The total amount of java heap memory for puppetserver
>>>    - The size of your code base
>>>
>>> This typically causes the kind of problems you're experiencing. What's
>>> happening in a nutshell is that puppet is loading so much code in memory
>>> that is starts running out of it and starts performing garbage collection
>>> more and more aggressively. At the end, 95% of all cpu cycles are spent on
>>> garbage collection and you don't have any cpu cycles left over to actually
>>> do work like compile catalogs...
>>>
>>> To understand how Puppet loads code into memory:
>>>
>>> Your code base is:  ( [ size of your control-repo ] + [ size of all the
>>> modules from the Puppetfile ] )  x  [ the amount of puppet code
>>> environments]
>>> So let's say:
>>>
>>>    - your control repo is 5MB in size
>>>    - all modules together are 95MB in size
>>>    - you have 4 code environments: development, testing, acceptance and
>>>    production
>>>
>>> That's 100MB of code to load in memory, per environment. For 4
>>> environments, that's 400MB.
>>> A different way to get this amount directly is to run *du -h
>>> /etc/puppetlabs/code/environments* on the puppet master and look at the
>>> size reported for */etc/puppetlabs/code/environments*
>>>
>>> Now every JRuby will load that entire code base into memory. So if you
>>> have 4 JRubies, that's 1600MB of java heap memory that's actually needed.
>>> You can imagine what problems will happen if there isn't this much heap
>>> memory configured...
>>>
>>> If you're using the defaults, Puppet will create the same amount of
>>> JRubies as the number of cpu cores on your master, minus 1, with a maximum
>>> of 4 JRubies for the system.
>>> If you override the defaults, you can specify any number of JRubies you
>>> want with the max-active-instances setting.
>>>
>>> So by default a 2-cpu puppet master will create 1 JRuby, a 4-cpu puppet
>>> master will create 3 JRubies, an 8-cpu puppet master will create 4 JRubies.
>>>
>>> So now you know how to determine the amount of java heap memory you need
>>> to configure, which you can do by configuring the -Xmx and -Xms options in
>>> the JAVA_ARGS section of the puppetserver startup command.
>>> Then finally make sure the host has enough physical memory available to
>>> provide this increased amount of java heap memory.
>>>
>>> Once enough java heap memory is provided, you'll see the cpu usage stay
>>> stable.
>>>
>>> Kind regards,
>>>
>>> Kevin Reeuwijk
>>>
>>> Principal Sales Engineer @ Puppet
>>>
>>> On Thursday, February 6, 2020 at 11:51:42 AM UTC+1, Martijn Grendelman
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> A question about Puppetserver performance.
>>>>
>>>> For quite a while now, our primary Puppet server is suffering from
>>>> severe slowness and high CPU usage. We have tried to tweak its settings,
>>>> giving it more memory (Xmx = 6 GB at the moment) and toying with the
>>>> 'max-active-instances' setting to no avail. The server has 8 virtual cores
>>>> and 12 GB memory in total, to run Pupperserver, PuppetDB and PostgreSQL.
>>>>
>>>> Notably, after a restart, the performance is acceptable for a while
>>>> (several hours, up to a almost day), but then it plummets again.
>>>>
>>>> We figured that the server was just unable to cope with the load (we
>>>> had over 270 nodes talking to it in 30 min intervals), so we added a second
>>>> master that now takes more than half of that load (150 nodes). That did not
>>>> make any difference at all for the primary server. The secondary server
>>>> however, has no trouble at all dealing with the load we gave it.
>>>>
>>>> In the graph below, that displays catalog compilation times for both
>>>> servers, you can see the new master in green. It has very constant high
>>>> performance. The old master is in yellow. After a restart, the compile
>>>> times are good (not great) for a while.The first dip represents ca. 4
>>>> hours, the second dip was 18 hours. At some point, the catalog compilation
>>>> times sky-rocket, as does the server load. 10 seconds in the graph below
>>>> corresponds to a server load of around 2, while 40 seconds corresponds to a
>>>> server load of around 5. It's the Puppetserver process using the CPU.
>>>>
>>>> The second server, the green line, has a consistent server load of
>>>> around 1, with 4 GB memory (2 GB for the Puppetserver JVM) and 2 cores
>>>> (it's an EC2 t3.medium).
>>>>
>>>>
>>>>
>>>> If I have 110 nodes, doing two runs per hour, that each take 30 seconds
>>>> to run, I would still have a concurrency of less than 2, so Puppet causing
>>>> a consistent load of 5 seems strange. My first thought would be that it's
>>>> garbage collection or something like that, but the server plenty of memory
>>>> (OS cache has 2GB).
>>>>
>>>> Any ideas on what makes the Puppetserver starting using so much CPU?
>>>> What can we try to keep it down?
>>>>
>>>> Thanks,
>>>> Martijn Grendelman
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Puppet Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to puppet-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/puppet-users/e8cff298-a0a6-48a9-9bbc-a7f000926467%40googlegroups.com
>>> <https://groups.google.com/d/msgid/puppet-users/e8cff298-a0a6-48a9-9bbc-a7f000926467%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Puppet Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to puppet-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/puppet-users/CA%2B%3DBEqXRUJPpkv%3D333Fgk0EgV7h%3DkmK-ZYji5sThpZeQEXHbFQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/puppet-users/CA%2B%3DBEqXRUJPpkv%3D333Fgk0EgV7h%3DkmK-ZYji5sThpZeQEXHbFQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/CAMuuXv2PzP7LxniTbSWDnHF59BNFMYwPqe3oH-9wn2%3DwBgrV4Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/puppet-users/CAMuuXv2PzP7LxniTbSWDnHF59BNFMYwPqe3oH-9wn2%3DwBgrV4Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Josh Cooper | Software Engineer
j...@puppet.com

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/CA%2Bu97ukhTOsrSnvanNRjWMug%3Do4ogRSTOnEsrFCwJEPO7H4-8A%40mail.gmail.com.

Re: [Puppet Users] Re: Puppetserver performance plummeting a few hours after startup

Reply via email to