On 9/29/15 12:20 AM, Matt Jarvis wrote:

 count | name

-------+-------------------------------------------------

     1 | macaddress_qvb34470225_cd

     1 | mtu_qbr2fb476b3_ff

     1 | speed_qvbfa2ec4e3_15

     1 | macaddress_qvo547572f9_14

     1 | speed_qvo2e200191_c0

     1 | mtu_qbr5eaffca5_fb

     1 | macaddress_qbr0d4ed278_e3

     1 | mtu_qvb8166a899_d1

     1 | speed_qvb4e0d1069_13

     1 | speed_qvbb2d99f31_86

     1 | mtu_qbr65afa39a_9a

     1 | speed_qvb336884d1_12

     1 | speed_qvbf81c2831_4f

     1 | mtu_qbr6d9cbcfc_82

     1 | mtu_qbr441a8d9c_9e

     1 | macaddress_qbrb400a4cf_a3

     1 | mtu_qbr0bdbfadc_6a

     1 | macaddress_qbrf9e0c7d4_7b

     1 | macaddress_qbr3fe74368_2f

     1 | macaddress_qvoc943cbcd_c3

     1 | macaddress_qvb7e04f0db_2b

     1 | mtu_qbrb42e4516_13

     1 | macaddress_qvbefdec85e_5b

     1 | mtu_qbr4575c981_84

     1 | speed_qvbb771b00f_b4

     1 | speed_qvo04f9f59c_d2

     1 | macaddress_qbre4308db4_12

     1 | speed_qvb997d8a21_72

     1 | mtu_qvo699d2518_05

     1 | mtu_qvbc5dcb18f_8b

     1 | mtu_qvb766c608d_7a

     1 | speed_qvo137786a3_ce

     1 | speed_qvo02ec32fd_28

     1 | macaddress_qbr3b6455da_f1

     1 | mtu_qvb993a2dfb_5e

     1 | macaddress_qvo14369bd5_d3


Is that enough of that query result ? We're an OpenStack public cloud provider, so in our cluster we have many network interfaces changing a lot when new virtual networks and machines are created - those are all related to virtual interfaces. Looks like the majority of that table is full of them.


It's enough to shoot down my theory about structured facts. Assuming the "desc" was included in the order by, that result indicates that you aren't storing any structured facts at all.

The long parameter list in the query you've identified represents the fact paths (equivalent to fact names when there are no structured facts) that become invalidated when a node updates its set of facts in PuppetDB. In the case of a structured fact, this could happen if you inserted an element at the beginning of a large array, but with flat facts like you appear to have I think this would have to mean that a) the node has 26k+ facts associated with it and b) 26k facts are being renamed or removed between the last successful puppet run and the run that's failing.

The final parameter ($26355 in your case) represents the name of the node that's failing, and you can get the associated certname with the query by getting the value of that parameter from your postgres logs and issuing

select certname from factsets where id=<value of $26355>;

from psql.

Can you give me answers to the following:
- has PuppetDB been running fine prior to this issue or have you recently adopted it?
- does it seem possible that you have no structured facts in your database?
- can you give me the first 10 rows of this query?
select count(*),factset_id from facts group by factset_id order by count desc;
- can you get the certname of the failing node using
select certname from factsets where id=<value of $26355>;
and send me the output of
curl -X GET http://localhost:8080/v4/factsets -d 'query=["=","certname","<your certname>"]' - once you have the certname, is there anything special about that node that you're aware of? - can you send me the compressed contents of the failed replace-facts commands in your dead letter directory? These will be located at
/opt/puppetlabs/server/data/puppetdb/mq/discarded/replace-facts
if you're on PC1 and
/var/lib/puppetdb/mq/discarded/replace-facts
if you aren't, assuming you're using the default pathing.

Additionally, this is probably going to require some back and forth between us -- if you want to chime in on the ticket at https://tickets.puppetlabs.com/browse/PDB-2003 <https://tickets.puppetlabs.com/browse/PDB-2003> we can continue the discussion there, and if you're on IRC I'm available in #puppet on freenode as wkalt, mostly during work hours on US pacific time.

Thanks,
Wyatt

On Monday, September 28, 2015 at 6:45:49 PM UTC+1, Wyatt Alt wrote:

    On 09/28/2015 10:39 AM, Wyatt Alt wrote:
    On 09/28/2015 05:40 AM, Matt Jarvis wrote:
    We seem to have hit a bit of an issue with puppetdb garbage
    collection. Initial symptoms were exceptions in the puppetdb logs :

    Retrying after attempt 6, due to:
    org.postgresql.util.PSQLException: This connection has been closed.


    And on the postgres side :


    LOG:  incomplete message from client


    Having turned up the logging on postgres, it appears that the query


    DELETE FROM fact_paths fp

              WHERE fp.id <http://fp.id> in ( $some_ids )  AND NOT
    EXISTS (SELECT 1 FROM facts f

      WHERE f.fact_path_id in ( $some_more_ids ) AND f.fact_path_id
    = fp.id <http://fp.id>

        AND f.factset_id <> $26355)


    is the cuplrit. This query is absolutely massive, with over
    26000 id's specified as parameters - as soon as the query is
    executed, postgres returns incomplete message from client and
    drops the connection.


    puppetdb is 2.3.7-1puppetlabs1

    postgres is 9.3


    Does anyone have any clues what's going on here ?


    Thanks


    Matt


    DataCentred Limited registered in England and Wales no. 05611763 --
    You received this message because you are subscribed to the
    Google Groups "Puppet Users" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to puppet-users...@googlegroups.com <javascript:>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/puppet-users/5fe3bad3-71a7-4348-a9ff-24d8a0284a1c%40googlegroups.com
    
<https://groups.google.com/d/msgid/puppet-users/5fe3bad3-71a7-4348-a9ff-24d8a0284a1c%40googlegroups.com>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.
    Hey Matt,

    I can reproduce this by inserting a value at the beginning of an
    extremely large array-valued structured fact, but we'll need to
    know more about your particular data to confirm whether that's
    your particular issue. This could be some large custom fact
    you're creating or something generated by a module.

    I've created a ticket here around this issue here
    https://tickets.puppetlabs.com/browse/PDB-2003
    <https://tickets.puppetlabs.com/browse/PDB-2003>

    can you connect to the database via psql and share (either here
    or in the ticket) the output of

    select count(*),name from fact_paths group by name order by count
    desc;

    ?

    My hope is that that will identify one or more large structured
    facts associated with a lot of leaf values, and then we'll need
    to figure out where they're coming from.

    Wyatt


    Just to clarify, I think the top few rows of that result should be
    enough to illustrate -- no need to include the whole thing.

    Wyatt


DataCentred Limited registered in England and Wales no. 05611763 --
You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscr...@googlegroups.com <mailto:puppet-users+unsubscr...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/81433280-72d8-4c9d-983d-f4c5227eb1e6%40googlegroups.com <https://groups.google.com/d/msgid/puppet-users/81433280-72d8-4c9d-983d-f4c5227eb1e6%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Puppet 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/560AE4EE.8000400%40puppetlabs.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to