Jira (PDB-5469) PuppetDB services shutting down due to query performance
Title: Message Title Subramanian Kalidass commented on PDB-5469 Re: PuppetDB services shutting down due to query performance This issue is a clone of PE-33705 for the sake of customer viewing and support ticket closure. Add Comment This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d) -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-bugs/JIRA.441921.1648448076000.785.1648448700094%40Atlassian.JIRA.
Jira (PDB-5469) PuppetDB services shutting down due to query performance
Title: Message Title Subramanian Kalidass updated an issue PuppetDB / PDB-5469 PuppetDB services shutting down due to query performance Change By: Subramanian Kalidass h3. Summary:The customer reported the issue reported on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query" The query used was an arbitrary selection of 50 nodes using the OR condition.h5. Did it ever work?It worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error.h5. When did the issue first occur?Not sure. The customer reported this issue as a possible bug.h5. If applicable, did you do a run with --debug --trace?N/Ah4. OS & Version: Ubuntu- 18.04.6 LTS (Bionic Beaver) h4. PE Version on the affected machine:2021.5h4. Master Or Agent Affected:Masterh4. All In One Or Split Install:Split installh4. Steps Taken To Reproduce:The customer has provided the below steps to reproduce this error1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query:inventory[certname] \{ certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"}(and so on, of course, with 30+ nodes)2. There is no step 2! When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem.On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump. {code:java} // 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes{ deactivated is null and expired is null }\n order by certname\n }" {code} Files Acquired:Support Bundle from Primary master and external puppet database h4. Relevant Error Messages:We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query. {code:java} // error from the puppetdb Log at java.base/java.lang.Thread.run(Thread.java:829)2022-03-10T17:42:47.401Z INFO [p.p.c.services] Periodic activities halted2022-03-10T17:42:47.402Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated...2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and logjava.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$f
Jira (PDB-5469) PuppetDB services shutting down due to query performance
Title: Message Title Subramanian Kalidass updated an issue PuppetDB / PDB-5469 PuppetDB services shutting down due to query performance Change By: Subramanian Kalidass h3. Summary:The customer reported the issue on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query" The query used was an arbitrary selection of 50 nodes using the OR condition.h5. Customer Organization Name : Splunk Release Engineering h5. Did it ever work? As per the ticket, it It worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error.h5. When did the issue first occur?Not sure. The customer reported this issue as a possible bug.h5. If applicable, did you do a run with --debug --trace?N/Ah4. OS & Version: Ubuntu- 18.04.6 LTS (Bionic Beaver) h4. PE Version on the affected machine:2021.5h4. Master Or Agent Affected:Masterh4. All In One Or Split Install:Split installh4. Steps Taken To Reproduce:The customer has provided the below steps to reproduce this error1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query:inventory[certname] \{ certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"}(and so on, of course, with 30+ nodes)2. There is no step 2! When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem.On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump. | {code:java} // 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] \ { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes || { deactivated is null and expired is null }\n order by certname\n }" | {code} Files Acquired:Support Bundle from Primary master and external puppet database h4. Relevant Error Messages:We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query. | {code:java} // error from the puppetdb Log || at java.base/java.lang.Thread.run(Thread.java:829) || 2022-03-10T17:42:47.401Z INFO [p.p.c.services] Periodic activities halted||2022 halted2022 -03-10T17:42:47.402Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated... || 2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and log||java logjava .util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed,
Jira (PDB-5469) PuppetDB services shutting down due to query performance
Title: Message Title Subramanian Kalidass created an issue PuppetDB / PDB-5469 PuppetDB services shutting down due to query performance Issue Type: Bug Assignee: Unassigned Components: PuppetDB Created: 2022/03/27 11:14 PM Priority: Normal Reporter: Subramanian Kalidass Summary: The customer reported the issue on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query" The query used was an arbitrary selection of 50 nodes using the OR condition. Customer Organization Name : Splunk Release Engineering Did it ever work? As per the ticket, it worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error. When did the issue first occur? Not sure. The customer reported this issue as a possible bug. If applicable, did you do a run with --debug --trace? N/A OS & Version: Ubuntu- 18.04.6 LTS (Bionic Beaver) PE Version on affected machine: 2021.5 Master Or Agent Affected: Master All In One Or Split Install: Split install Steps Taken To Reproduce: The customer has provided the below steps to reproduce this error 1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query: inventory[certname] { certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"} (and so on, of course, with 30+ nodes) 2. There is no step 2! When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem. On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump.