Re: [SQL] [PERFORM] SQL Query Performance - what gives?

2009-08-18 Thread Pierre Frédéric Caillau d



The bitmask allows the setting of multiple permissions but the table
definition doesn't have to change (well, so long as the bits fit into a
word!)  Finally, this is a message forum - the actual code itself is
template-driven and the bitmask permission structure is ALL OVER the
templates; getting that out of there would be a really nasty rewrite,
not to mention breaking the user (non-developer, but owner)
extensibility of the current structure.

Is there a way to TELL the planner how to deal with this, even if it
makes the SQL non-portable or is a hack on the source mandatory?


	You could use an integer array instead of a bit mask, make a gist index  
on it, and instead of doing "mask & xxx" do "array contains xxx", which is  
indexable with gist. The idea is that it can get much better row  
estimation. Instead of 1,2,3, you can use 1,2,4,8, etc if you like. you'd  
probably need a function to convert a bitmask into ints and another to do  
the conversion back, so the rest of your app gets the expected bitmasks.  
Or add a bitmask type to postgres with ptoper statistics...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [SQL] [PERFORM] SQL Query Performance - what gives?

2009-08-18 Thread Karl Denninger
Tom Lane wrote:
> Karl Denninger  writes:
>   
>> The problem appearsa to lie in the "nested loop", and I don't understand
>> why that's happening.
>> 
> It looks to me like there are several issues here.
>
> One is the drastic underestimate of the number of rows satisfying the
> permission condition. That leads the planner to think that a nestloop
> join with the other table will be fast, which is only right if there are
> just one or a few rows coming out of "forum".  With sixty-some rows you
> get sixty-some repetitions of the scan of the other table, which loses.
>   
"Loses" isn't quite the right word... :)
> Problem number two is the overeager use of a BitmapAnd to add on another
> index that isn't really very selective.  That might be a correct
> decision but it looks fishy here.  We rewrote choose_bitmap_and a couple
> of times to try to fix that problem ... what PG version is this exactly?
>   
$ psql ticker
Welcome to psql 8.3.6, the PostgreSQL interactive terminal.

> The third thing that looks fishy is that it's using unqualified index
> scans for no apparent reason.  Have you got enable_seqscan turned off,
> and if so what happens when you fix that?  What other nondefault planner
> settings are you using?
>   
None; here is the relevant section of the postgresql.conf file:

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0# measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01  # same scale as above
#cpu_index_tuple_cost = 0.005   # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
#effective_cache_size = 128MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5# range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0   # selects default based on effort
#geqo_selection_bias = 2.0  # range 1.5-2.0

# - Other Planner Options -

default_statistics_target = 100 # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8# 1 disables collapsing of explicit
# JOIN clauses

All commented out - nothing set to non-defaults, other than the default
statistics target.
> But anyway, the big problem seems to be poor selectivity estimates for 
> conditions like "(permission & 127) = permission".  I have bad news for
> you: there is simply no way in the world that Postgres is not going to
> suck at estimating that, because the planner has no knowledge whatsoever
> of the behavior of "&".  You could consider writing and submitting a
> patch that would teach it something about that, but in the near term
> it would be a lot easier to reconsider your representation of
> permissions.  You'd be likely to get significantly better results,
> not to mention have more-readable queries, if you stored them as a group
> of simple boolean columns.
>
>   regards, tom lane
>   
Ugh.

The issue here is that the permission structure is quite extensible by
the users of the code; there are defined bits (Bit 4, for example, means
that the user is an "ordinary user" and has a login account) but the
upper bits are entirely administrator-defined and may vary from one
installation to another (and do)

The bitmask allows the setting of multiple permissions but the table
definition doesn't have to change (well, so long as the bits fit into a
word!)  Finally, this is a message forum - the actual code itself is
template-driven and the bitmask permission structure is ALL OVER the
templates; getting that out of there would be a really nasty rewrite,
not to mention breaking the user (non-developer, but owner)
extensibility of the current structure.

Is there a way to TELL the planner how to deal with this, even if it
makes the SQL non-portable or is a hack on the source mandatory?

For the particular instance where this came up it won't be murderous to
omit the bitmask check from the query, as there are no "owner/moderator
only" sub-forums (the one place where not checking that would bite HARD
as it would allow searches of "hidden" content by ordinary users.) 
However, there are other installations where this will be a bigger deal;
I can in the immediate term put that query into the config file (instead
of hard-coding it) so for people who can't live with the performance
they can make the tradeoff decision.


-- Karl
begin:vcard
fn:Karl Denninger
n:Denninger;Karl
org:Cuda Systems LLC
adr;dom:;;314 Olde Post Road;Niceville;FL;32578
email;internet:k...@denninger.net
tel;work:850-376-9364
tel;fax:850-897-9364
x-mozilla-html:TRUE
url

Re: [SQL] [PERFORM] SQL Query Performance - what gives?

2009-08-18 Thread Tom Lane
Karl Denninger  writes:
> The problem appearsa to lie in the "nested loop", and I don't understand
> why that's happening.

It looks to me like there are several issues here.

One is the drastic underestimate of the number of rows satisfying the
permission condition. That leads the planner to think that a nestloop
join with the other table will be fast, which is only right if there are
just one or a few rows coming out of "forum".  With sixty-some rows you
get sixty-some repetitions of the scan of the other table, which loses.

Problem number two is the overeager use of a BitmapAnd to add on another
index that isn't really very selective.  That might be a correct
decision but it looks fishy here.  We rewrote choose_bitmap_and a couple
of times to try to fix that problem ... what PG version is this exactly?

The third thing that looks fishy is that it's using unqualified index
scans for no apparent reason.  Have you got enable_seqscan turned off,
and if so what happens when you fix that?  What other nondefault planner
settings are you using?

But anyway, the big problem seems to be poor selectivity estimates for 
conditions like "(permission & 127) = permission".  I have bad news for
you: there is simply no way in the world that Postgres is not going to
suck at estimating that, because the planner has no knowledge whatsoever
of the behavior of "&".  You could consider writing and submitting a
patch that would teach it something about that, but in the near term
it would be a lot easier to reconsider your representation of
permissions.  You'd be likely to get significantly better results,
not to mention have more-readable queries, if you stored them as a group
of simple boolean columns.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance