Re: [GENERAL] Rules, Windows and ORDER BY

2012-08-24 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Fri, Aug 24, 2012 at 09:32:32AM +, Jason Dusek wrote:
>> Why are the individual indices not useful? The tests that the
>> query does -- equality on key and realm and ordering on at --
>> are each supported by indices. Does it have to do with the cost
>> of loading the three indices?

> I'm not entirely sure, but I'll take a stab at it. I think it has to do
> with the fact that you want order. Combining multiple indexes so you
> use them at the same time works as an BitmapAnd. That is, it uses each
> index to determine blocks that are interesting and then find the blocks
> that are listed by all tindexes, and then it loads the blocks and chcks
> them.

Yeah.  While you *can* in principle solve the problem with the
individual indexes, it's much less efficient than a single index.
In particular, BitmapAnd plans are far from being a magic bullet
for combining two individually-not-very-selective conditions.
(That realm constraint is surely not very selective; dunno about
the key one.)  That implies reading a large number of entries from
each index, forming a rather large bitmap for each one, and then
ANDing those bitmaps to get a smaller one.  And even after all that
work, you're still not done, because you have no idea which bit in
the bitmap represents the row with largest "at" value.

> In theory you could BitmapAnd the 'k' and 'realm' indexes and then scan
> the 'at' index only checking rows that the bitmap shows are
> interesting.  But I'm not sure if postgres can do that.

No, it can't, and that likely wouldn't be a very effective plan anyway;
you could end up scanning a very large fraction of the "at" index, since
you'd have to start at the end (the latest entry anywhere in the table).
Even if you didn't make many trips to the heap, that's not cheap.

In constrast, given a three-column btree index organized with the
equality-constrained columns first, the btree code can descend the
index tree straight to the entry you want.  We've expended a lot of
sweat on optimizing that case, and it will absolutely blow the doors
off anything involving a bitmap scan.

Of course the downside is that the three-column index might be
relatively useless for queries of forms other than this one.
So it's a tradeoff between flexibility and performance.  But since
the OP is asking, I'm assuming he cares a lot about performance of
queries of this exact form.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rules, Windows and ORDER BY

2012-08-24 Thread Martijn van Oosterhout
On Fri, Aug 24, 2012 at 09:32:32AM +, Jason Dusek wrote:
> 2012/8/23 Tom Lane :
> > Jason Dusek  writes:
> >>   CREATE TABLE kv
> >>   ( k bytea NOT NULL,
> >> at timestamptz NOT NULL,
> >> realm bytea NOT NULL,
> >> v bytea NOT NULL );
> >>   CREATE INDEX ON kv USING hash(k);
> >>   CREATE INDEX ON kv (t);
> >>   CREATE INDEX ON kv USING hash(realm);
> >
> >>   SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1;
> >
> > If you want to make that fast, an index on (k,realm,at) would
> > help.  Those indexes that you did create are next to useless
> > for this, and furthermore hash indexes are quite unsafe for
> > production.
> 
> Why are the individual indices not useful? The tests that the
> query does -- equality on key and realm and ordering on at --
> are each supported by indices. Does it have to do with the cost
> of loading the three indices?

I'm not entirely sure, but I'll take a stab at it. I think it has to do
with the fact that you want order. Combining multiple indexes so you
use them at the same time works as an BitmapAnd. That is, it uses each
index to determine blocks that are interesting and then find the blocks
that are listed by all tindexes, and then it loads the blocks and chcks
them.

The problem here is that you want ORDER BY at, which makes the above
scheme fall apart, because order is not preversed. So it falls back on
either scanning the 'at' index and probing checking the rows to see if
they match, or using all indexes, and then sorting the result.

In theory you could BitmapAnd the 'k' and 'realm' indexes and then scan
the 'at' index only checking rows that the bitmap shows are
interesting.  But I'm not sure if postgres can do that.

Anyway, the suggested three column index will match your query in a
single lookup and hence be much faster than any of the above
suggestions, so if this is a really important query then it may be
worth it here.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [GENERAL] Rules, Windows and ORDER BY

2012-08-24 Thread Jason Dusek
2012/8/23 Tom Lane :
> Jason Dusek  writes:
>> I have a simple table of keys and values which periodically
>> receives updated values. It's desirable to keep older values
>> but, most of the time, we query only for the latest value of a
>> particular key.
>
>>   CREATE TABLE kv
>>   ( k bytea NOT NULL,
>> at timestamptz NOT NULL,
>> realm bytea NOT NULL,
>> v bytea NOT NULL );
>>   CREATE INDEX ON kv USING hash(k);
>>   CREATE INDEX ON kv (t);
>>   CREATE INDEX ON kv USING hash(realm);
>
>>   SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1;
>
> If you want to make that fast, an index on (k,realm,at) would
> help.  Those indexes that you did create are next to useless
> for this, and furthermore hash indexes are quite unsafe for
> production.

Thanks for pointing out the unsafety of hash indexes. I think I
got in the habit of using them for a project with large,
temporary data sets.

Why are the individual indices not useful? The tests that the
query does -- equality on key and realm and ordering on at --
are each supported by indices. Does it have to do with the cost
of loading the three indices?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rules, Windows and ORDER BY

2012-08-23 Thread Tom Lane
Jason Dusek  writes:
> I have a simple table of keys and values which periodically
> receives updated values. It's desirable to keep older values
> but, most of the time, we query only for the latest value of a
> particular key.

>   CREATE TABLE kv
>   ( k bytea NOT NULL,
> at timestamptz NOT NULL,
> realm bytea NOT NULL,
> v bytea NOT NULL );
>   CREATE INDEX ON kv USING hash(k);
>   CREATE INDEX ON kv (t);
>   CREATE INDEX ON kv USING hash(realm);

>   SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1;

If you want to make that fast, an index on (k,realm,at) would help.
Those indexes that you did create are next to useless for this,
and furthermore hash indexes are quite unsafe for production.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general