On 9/29/15 12:20 AM, Matt Jarvis wrote:
count | name
-------+-------------------------------------------------
1 | macaddress_qvb34470225_cd
1 | mtu_qbr2fb476b3_ff
1 | speed_qvbfa2ec4e3_15
1 | macaddress_qvo547572f9_14
1 | speed_qvo2e200191_c0
1 | mtu_qbr5eaffca5_fb
1 | macaddress_qbr0d4ed278_e3
1 | mtu_qvb8166a899_d1
1 | speed_qvb4e0d1069_13
1 | speed_qvbb2d99f31_86
1 | mtu_qbr65afa39a_9a
1 | speed_qvb336884d1_12
1 | speed_qvbf81c2831_4f
1 | mtu_qbr6d9cbcfc_82
1 | mtu_qbr441a8d9c_9e
1 | macaddress_qbrb400a4cf_a3
1 | mtu_qbr0bdbfadc_6a
1 | macaddress_qbrf9e0c7d4_7b
1 | macaddress_qbr3fe74368_2f
1 | macaddress_qvoc943cbcd_c3
1 | macaddress_qvb7e04f0db_2b
1 | mtu_qbrb42e4516_13
1 | macaddress_qvbefdec85e_5b
1 | mtu_qbr4575c981_84
1 | speed_qvbb771b00f_b4
1 | speed_qvo04f9f59c_d2
1 | macaddress_qbre4308db4_12
1 | speed_qvb997d8a21_72
1 | mtu_qvo699d2518_05
1 | mtu_qvbc5dcb18f_8b
1 | mtu_qvb766c608d_7a
1 | speed_qvo137786a3_ce
1 | speed_qvo02ec32fd_28
1 | macaddress_qbr3b6455da_f1
1 | mtu_qvb993a2dfb_5e
1 | macaddress_qvo14369bd5_d3
Is that enough of that query result ? We're an OpenStack public cloud
provider, so in our cluster we have many network interfaces changing a
lot when new virtual networks and machines are created - those are all
related to virtual interfaces. Looks like the majority of that table
is full of them.
It's enough to shoot down my theory about structured facts. Assuming the
"desc" was included in the order by, that result indicates that you
aren't storing any structured facts at all.
The long parameter list in the query you've identified represents the
fact paths (equivalent to fact names when there are no structured facts)
that become invalidated when a node updates its set of facts in
PuppetDB. In the case of a structured fact, this could happen if you
inserted an element at the beginning of a large array, but with flat
facts like you appear to have I think this would have to mean that a)
the node has 26k+ facts associated with it and b) 26k facts are being
renamed or removed between the last successful puppet run and the run
that's failing.
The final parameter ($26355 in your case) represents the name of the
node that's failing, and you can get the associated certname with the
query by getting the value of that parameter from your postgres logs and
issuing
select certname from factsets where id=<value of $26355>;
from psql.
Can you give me answers to the following:
- has PuppetDB been running fine prior to this issue or have you
recently adopted it?
- does it seem possible that you have no structured facts in your database?
- can you give me the first 10 rows of this query?
select count(*),factset_id from facts group by factset_id order by count
desc;
- can you get the certname of the failing node using
select certname from factsets where id=<value of $26355>;
and send me the output of
curl -X GET http://localhost:8080/v4/factsets -d
'query=["=","certname","<your certname>"]'
- once you have the certname, is there anything special about that node
that you're aware of?
- can you send me the compressed contents of the failed replace-facts
commands in your dead letter directory? These will be located at
/opt/puppetlabs/server/data/puppetdb/mq/discarded/replace-facts
if you're on PC1 and
/var/lib/puppetdb/mq/discarded/replace-facts
if you aren't, assuming you're using the default pathing.
Additionally, this is probably going to require some back and forth
between us -- if you want to chime in on the ticket at
https://tickets.puppetlabs.com/browse/PDB-2003
<https://tickets.puppetlabs.com/browse/PDB-2003> we can continue the
discussion there, and if you're on IRC I'm available in #puppet on
freenode as wkalt, mostly during work hours on US pacific time.
Thanks,
Wyatt
On Monday, September 28, 2015 at 6:45:49 PM UTC+1, Wyatt Alt wrote:
On 09/28/2015 10:39 AM, Wyatt Alt wrote:
On 09/28/2015 05:40 AM, Matt Jarvis wrote:
We seem to have hit a bit of an issue with puppetdb garbage
collection. Initial symptoms were exceptions in the puppetdb logs :
Retrying after attempt 6, due to:
org.postgresql.util.PSQLException: This connection has been closed.
And on the postgres side :
LOG: incomplete message from client
Having turned up the logging on postgres, it appears that the query
DELETE FROM fact_paths fp
WHERE fp.id <http://fp.id> in ( $some_ids ) AND NOT
EXISTS (SELECT 1 FROM facts f
WHERE f.fact_path_id in ( $some_more_ids ) AND f.fact_path_id
= fp.id <http://fp.id>
AND f.factset_id <> $26355)
is the cuplrit. This query is absolutely massive, with over
26000 id's specified as parameters - as soon as the query is
executed, postgres returns incomplete message from client and
drops the connection.
puppetdb is 2.3.7-1puppetlabs1
postgres is 9.3
Does anyone have any clues what's going on here ?
Thanks
Matt
DataCentred Limited registered in England and Wales no. 05611763 --
You received this message because you are subscribed to the
Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to puppet-users...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/5fe3bad3-71a7-4348-a9ff-24d8a0284a1c%40googlegroups.com
<https://groups.google.com/d/msgid/puppet-users/5fe3bad3-71a7-4348-a9ff-24d8a0284a1c%40googlegroups.com>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
Hey Matt,
I can reproduce this by inserting a value at the beginning of an
extremely large array-valued structured fact, but we'll need to
know more about your particular data to confirm whether that's
your particular issue. This could be some large custom fact
you're creating or something generated by a module.
I've created a ticket here around this issue here
https://tickets.puppetlabs.com/browse/PDB-2003
<https://tickets.puppetlabs.com/browse/PDB-2003>
can you connect to the database via psql and share (either here
or in the ticket) the output of
select count(*),name from fact_paths group by name order by count
desc;
?
My hope is that that will identify one or more large structured
facts associated with a lot of leaf values, and then we'll need
to figure out where they're coming from.
Wyatt
Just to clarify, I think the top few rows of that result should be
enough to illustrate -- no need to include the whole thing.
Wyatt
DataCentred Limited registered in England and Wales no. 05611763 --
You received this message because you are subscribed to the Google
Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to puppet-users+unsubscr...@googlegroups.com
<mailto:puppet-users+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/81433280-72d8-4c9d-983d-f4c5227eb1e6%40googlegroups.com
<https://groups.google.com/d/msgid/puppet-users/81433280-72d8-4c9d-983d-f4c5227eb1e6%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Puppet
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/560AE4EE.8000400%40puppetlabs.com.
For more options, visit https://groups.google.com/d/optout.