On 08/05/2014 04:08 PM, john gale wrote:

Most of the planner options haven't diverged from default, so
default_statistics_target is still set to 100.  I'm vaguely
understanding the docs on that variable, but if we have the space it
sounds like we should bump this up significantly to accommodate more
statistics?

Yeah. You'll want to bump that up. However, I'm not sure how much of a difference that'll make to the hstore stuff due to the blobby nature of that type of data.

Also, that doesn't make sense to me, since we don't have 2.5mil rows
that match this one SpawnID.  Could this suggest that my partial
hstore index is somehow misconstructed?  Or is that saying that
2.5mil rows have a SpawnID, not all of which will be the one I'm
looking for?

I'm not sure. But according to that explain analyze, it matched 2.5 million... somethings. Perhaps it has to do with how hstore is actually indexed. Maybe it starts at SpawnID, then narrows it down with SpawnID-428870395.258592, but can't do so until it fetches the SpawnID part. I'll stop commenting on hstore anything, because I never use it.

Not necessarily 300k records, but yes we essentially have one 100G+
table that we pull results to analyze from, whether it's 30 records
or 3k.  300k seems like an abnormality but that's probably why I
jumped on this one particular query in the first place.

Yeah. Like I said, it's all of that preliminary work that's inflating your execution time. Something tells me that better stats might help you out. Increase default_statistics_target to 300 or 400, analyze the table, and try again. See what happens.

Interestingly, make the select clause *less* specific by removing the
started_at part has sped this up quite a bit (if I'm reading the
explain correctly, which I don't really have a history of doing):

You might think that, but not really. :) Consider this:

->  Index Scan using index_testruns_on_custom_spawnid on testruns
(cost=0.57..306430.20 rows=319935 width=1399) (actual
time=0.030..177067.785 rows=348672 loops=1)

That's the only index scan it applied. By not using the much larger index with far more hits (5-million for the previous "more specific" query) it did less work. Something tells me that the index_testruns_on_started_at_2 index is confusing the planner into doing way more than it should.

You might be better off by using this query in a CTE (WITH syntax) and then using a post-processing WHERE clause to filter to the proper time range. Tricking the planner that way is kinda gross, but it works. This is assuming your stat adjustment doesn't work.

I've also been thinking about creating partial date indexes since
that's one of the few easy buckets to shove things into, which seems
like it would cut down on the searchable index size.

Properly applying partial indexes without overdoing it is a delicate dance, but it can pay off in spades. Try it, it might work out. ;)

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to