Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes

2012-11-21 Thread Jakov Sosic

On 11/22/2012 02:20 AM, Deepak Giridharagopal wrote:


 From my earlier reply:


Ah, sorry, my mistake. I didn't saw that because the first part of your 
answer was top-posted so I didn't even bother to scroll further down...


Thank you nevertheless :)


--
Jakov Sosic
www.srce.unizg.hr

--
You received this message because you are subscribed to the Google Groups "Puppet 
Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes

2012-11-21 Thread Deepak Giridharagopal
On Wed, Nov 21, 2012 at 5:43 PM, Jakov Sosic  wrote:

> On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote:
> > We (grim_radical, nlew, and cprice) have continued working with Matt
> > (sjoeboo) on the #puppet IRC channel over the last few days. Apologies
> > for not updating this thread accordingly!
> >
>
> jcbollinger asked about documentation of the schema of database, you
> seem to missed that part for some reason?
>
>
>From my earlier reply:

"The schema is defined here:
https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj

It is currently represented as a "base" schema, with migrations on top that
modify it. We should probably have a complementary version of this code
that has the entire schema in totality in one shot, instead of building it
up incrementally. Or at a minimum, a dev document that outlines the schema.
I'd very much welcome some community help on that!"

deepak

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes

2012-11-21 Thread Jakov Sosic
On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote:
> We (grim_radical, nlew, and cprice) have continued working with Matt
> (sjoeboo) on the #puppet IRC channel over the last few days. Apologies
> for not updating this thread accordingly!
> 

jcbollinger asked about documentation of the schema of database, you
seem to missed that part for some reason?

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes

2012-11-15 Thread Deepak Giridharagopal
We (grim_radical, nlew, and cprice) have continued working with Matt
(sjoeboo) on the #puppet IRC channel over the last few days. Apologies for
not updating this thread accordingly!

On Thu, Nov 15, 2012 at 7:53 AM, jcbollinger wrote:

>
>
> On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote:
>>
>> I was working on this a bit w/ grim_radical/nlewis on Friday afternoon,
>> and wanted to loop back, figured at least starting the conversion here
>> again was a good idea.
>>
>> We have starting having problems with one query slowing our puppetdb way,
>> way down, and seemingly blocking other queries/node runs.
>>
>> We are an HPC shop, and have about 2K nodes. of that, about 1500 do the
>> following:
>>
>> @@sshkey { "${hostname}":
>>   host_aliases=> ["$fqdn", "$ipaddress" ],
>>   type=> "rsa",
>>   key => $sshrsakey,
>>   ensure   => present,
>> }
>>
>> Sshkey <<| type == "rsa" |>> {ensure => present}
>>
>> Thats it. nothing really crazy/special in there.
>>
>> This ends up as:
>>
>> LOG:  duration: 5690.773 ms  execute : select results.* from
>> (SELECT certname_catalogs.certname, catalog_resources.resource,
>> catalog_resources.type, catalog_resources.title,**catalog_resources.tags,
>> catalog_resources.exported, catalog_resources.sourcefile,
>> catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources
>> JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp
>> USING(resource) INNER JOIN certnames ON certname_catalogs.certname =
>> certnames.name WHERE (catalog_resources.type = $1) AND
>> (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND
>> (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource
>> IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND
>> rp.value = $5))) results LIMIT 50001
>> DETAIL:  parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 =
>> 'type', $5 = '"rsa"'
>>
>> Would adding an index on this be an option (i'm not a huge postgres guru,
>> maybe I'm using the wrong terms).
>>
>> As soon as we commented out the collection, like:
>>
>> #Sshkey <<| type == "rsa" |>> {ensure => present}
>>
>> Things all go back to normal, and nodes run nice and quickly. With that
>> in there, nodes would hang running and start timing out. Our 2K nodes are
>> on a 2 hour run interval.
>>
>> Any help/thoughts? I'm in irc as sjoeboo as well.
>>
>>
>
>
> First, make sure you are using thin_storeconfigs.  After making the switch
> (if it is a switch), it may take some time for the all nodes' changes to
> propagate to the DB, but the difference should be a lot fewer rows in your
> DB.  That could speed you up far more than any indexing.
>

Just an FYI...thin_storeconfigs doesn't have an effect when using PuppetDB,
as we always store the entire catalog every time. I'm confident we can come
up with a perf fix without resorting to storing significantly less
information, but that largely depends on how clever we are. :)


>
> Also, I presume that you are already using the PostgreSQL back-end instead
> of the built-in one, but if not then you should switch now.  The built-in
> back end is simply not up to the task of handing so many nodes efficiently.
>
> The query itself looks like it could use some optimization, but that's out
> of your hands unless you want to hack on Puppet itself.
>
> I don't know which columns may be indexed already, and I didn't find any
> documentation of the schema at PL or in puppetlabs' GitHub (what's up with
> that, PL?).  It might indeed be the case that adding indexes on one or more
> key columns would help you out, but you really ought to tackle this in a
> systematic manner.
>

There are two pieces to this: there is the puppetdb query that's formulated
on the puppet side, inside the puppetdb "resource" terminus. Then there's
the translation of that query to low-level SQL, which happens inside of the
PuppetDB daemon.

The schema is defined here:
https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj

It is currently represented as a "base" schema, with migrations on top that
modify it. We should probably have a complementary version of this code
that has the entire schema in totality in one shot, instead of building it
up incrementally. Or at a minimum, a dev document that outlines the schema.
I'd very much welcome some community help on that!


>
> Specifically, use a query analyzer (I presume Postgres has one) to
> identify the expensive parts of that query, and consider adding indices
> that will improve those parts (e.g. indices on columns of long tables that
> serve as join columns or WHERE criteria).  Lather, rinse, repeat until it's
> good enough or you can't do any better.
>
> Be aware also that time saved in the query will be partially offset by
> time consumed in maintaining each additional index you create.  If you
> choose strategically then you could conceivably see a dramatic overall
> 

[Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes

2012-11-15 Thread jcbollinger


On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote:
>
> I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, 
> and wanted to loop back, figured at least starting the conversion here 
> again was a good idea.
>
> We have starting having problems with one query slowing our puppetdb way, 
> way down, and seemingly blocking other queries/node runs.
>
> We are an HPC shop, and have about 2K nodes. of that, about 1500 do the 
> following:
>
> @@sshkey { "${hostname}":
>   host_aliases=> ["$fqdn", "$ipaddress" ],
>   type=> "rsa",
>   key => $sshrsakey,
>   ensure   => present,
> }
>
> Sshkey <<| type == "rsa" |>> {ensure => present}
>
> Thats it. nothing really crazy/special in there. 
>
> This ends up as:
>
> LOG:  duration: 5690.773 ms  execute : select results.* from 
> (SELECT certname_catalogs.certname, catalog_resources.resource, 
> catalog_resources.type, catalog_resources.title,catalog_resources.tags, 
> catalog_resources.exported, catalog_resources.sourcefile, 
> catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources 
> JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp 
> USING(resource) INNER JOIN certnames ON certname_catalogs.certname = 
> certnames.name WHERE (catalog_resources.type = $1) AND 
> (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND 
> (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource 
> IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND 
> rp.value = $5))) results LIMIT 50001
> DETAIL:  parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 = 
> 'type', $5 = '"rsa"'
>
> Would adding an index on this be an option (i'm not a huge postgres guru, 
> maybe I'm using the wrong terms). 
>
> As soon as we commented out the collection, like:
>
> #Sshkey <<| type == "rsa" |>> {ensure => present}
>
> Things all go back to normal, and nodes run nice and quickly. With that in 
> there, nodes would hang running and start timing out. Our 2K nodes are on a 
> 2 hour run interval.
>
> Any help/thoughts? I'm in irc as sjoeboo as well.
>  
>


First, make sure you are using thin_storeconfigs.  After making the switch 
(if it is a switch), it may take some time for the all nodes' changes to 
propagate to the DB, but the difference should be a lot fewer rows in your 
DB.  That could speed you up far more than any indexing.

Also, I presume that you are already using the PostgreSQL back-end instead 
of the built-in one, but if not then you should switch now.  The built-in 
back end is simply not up to the task of handing so many nodes efficiently.

The query itself looks like it could use some optimization, but that's out 
of your hands unless you want to hack on Puppet itself.

I don't know which columns may be indexed already, and I didn't find any 
documentation of the schema at PL or in puppetlabs' GitHub (what's up with 
that, PL?).  It might indeed be the case that adding indexes on one or more 
key columns would help you out, but you really ought to tackle this in a 
systematic manner.

Specifically, use a query analyzer (I presume Postgres has one) to identify 
the expensive parts of that query, and consider adding indices that will 
improve those parts (e.g. indices on columns of long tables that serve as 
join columns or WHERE criteria).  Lather, rinse, repeat until it's good 
enough or you can't do any better.

Be aware also that time saved in the query will be partially offset by time 
consumed in maintaining each additional index you create.  If you choose 
strategically then you could conceivably see a dramatic overall gain, but 
don't go overboard.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.