[GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-05 Thread Steven Schlansker
9d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Index Scan on account_id_user_id_idx  
(cost=0.00..17.56 rows=3 width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Heap Scan on account  (cost=37.20..3188.55 rows=803 width=160)
 Recheck Cond: (user_id = public.account.user_id)
 ->  Bitmap Index Scan on account_user_id_idx  (cost=0.00..37.00 
rows=803 width=0)
   Index Cond: (user_id = public.account.user_id)
(10 rows)

ness_user=# explain SELECT * FROM account WHERE id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');
 QUERY 
PLAN  
-
 Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 width=160)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Index Scan on account_id_user_id_idx  (cost=0.00..17.56 rows=3 
width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
(4 rows)

(where "reasonable" is defined as "not a sequential scan")

Upon seeing this -- I had a crazy idea.  What if I just paste them together 
with a UNION DISTINCT?

ness_user=# explain SELECT * FROM account WHERE 
ness_user-# user_id in (SELECT user_id FROM account WHERE id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'))
 UNION DISTINCT
ness_user-# SELECT * FROM account WHERE
ness_user-# id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');

 QUERY PLAN 
 
-
 HashAggregate  (cost=3342.22..3366.35 rows=2413 width=160)
   ->  Append  (cost=66.79..3281.90 rows=2413 width=160)
 ->  Nested Loop  (cost=66.79..3228.18 rows=2410 width=160)
   ->  HashAggregate  (cost=29.59..29.60 rows=1 width=16)
 ->  Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 
width=16)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Index Scan on account_id_user_id_idx  
(cost=0.00..17.56 rows=3 width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Heap Scan on account  (cost=37.20..3188.55 rows=803 
width=160)
 Recheck Cond: (user_id = public.account.user_id)
 ->  Bitmap Index Scan on account_user_id_idx  
(cost=0.00..37.00 rows=803 width=0)
   Index Cond: (user_id = public.account.user_id)
 ->  Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 width=160)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   ->  Bitmap Index Scan on account_id_user_id_idx  
(cost=0.00..17.56 rows=3 width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
(16 rows)


Wow!  Changing the query from using an OR clause to a UNION DISTINCT with two 
SELECTs reduced the cost from 1379485.60 to 3366.35!  And the gains are 
realized when you actually execute the query.

Why is using an OR so awful here?  Why does it pick a sequential scan?  Is this 
an optimizer bug or have I missed something in my queries?

Thanks much for any advice,
Steven Schlansker


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-05 Thread Steven Schlansker

On Jul 5, 2012, at 3:51 PM, Tom Lane wrote:

> Steven Schlansker  writes:
>> Why is using an OR so awful here?
> 
> Because the OR stops it from being a join (it possibly needs to return
> some rows that are not in the semijoin of the two tables).
> 
>> Why does it pick a sequential scan?  Is this an optimizer bug
> 
> No.  It can't transform OR into a UNION because the results might not
> be the same.  I assume you don't care about removal of duplicates, or
> have some reason to know that there won't be any ... but the planner
> doesn't know that.
> 

Thanks for the insight here.  It still seems unfortunate that it picks a
sequential scan -- but if there really is no more efficient way to do this,
I will just rewrite the query.

Steven


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-06 Thread Steven Schlansker

On Jul 5, 2012, at 6:35 PM, Jasen Betts wrote:

> I note you've decided to rewrite this query as a union 
> 
>> SELECT * FROM account
>>  WHERE user_id in 
>>(SELECT user_id FROM account 
>>  WHERE id = 
>> ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'))
>>  OR
>>id = 
>> ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');
> 
> I notice both arrays (used with = ANY) have the exact same content,
> 
> if this is always true you can use a CTE here for the ID=ANY(...)
> query and reference the CTE on both sides of the union.
> 

Thanks for the idea!  I'll be sure to incorporate that.  Doesn't fix the 
unfortunate behavior with OR, though.

> WITH i as (
> SELECT * FROM account WHERE id = 
> ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}')
> )
> SELECT 
> * from i
> UNION DISTINCT 
> SELECT
> account.* from account join i on i.user_id = account.userid ;
> 
> -- 
> ⚂⚃ 100% natural
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-09 Thread Steven Schlansker
On Jul 6, 2012, at 9:24 PM, Gurjeet Singh wrote:

> On Thu, Jul 5, 2012 at 7:16 PM, Steven Schlansker  wrote:
> 
> On Jul 5, 2012, at 3:51 PM, Tom Lane wrote:
> 
> > Steven Schlansker  writes:
> >> Why is using an OR so awful here?
> >
> > Because the OR stops it from being a join (it possibly needs to return
> > some rows that are not in the semijoin of the two tables).
> >
> >> Why does it pick a sequential scan?  Is this an optimizer bug
> >
> > No.  It can't transform OR into a UNION because the results might not
> > be the same.  I assume you don't care about removal of duplicates, or
> > have some reason to know that there won't be any ... but the planner
> > doesn't know that.
> >
> 
> Thanks for the insight here.  It still seems unfortunate that it picks a
> sequential scan -- but if there really is no more efficient way to do this,
> I will just rewrite the query.
> 
> It might not be applicable to this case (because of the use of ANY in second 
> branch of OR clause), but some databases provide a feature called 
> OR-Optimization, where the optimizer breaks up the query at OR clause 
> boundaries and uses UNION ALL operator to join the resulting queries, just 
> like you did. Optimizer does need to add additional AND clauses to some of 
> the branches to make sure the result set is not affected.
> 

That sounds like a great optimization for Postgres, but unfortunately it's far 
outside of my skill set / time to contribute, so I'd have to wait for a "real" 
PG dev to get to it :)

> Just a thought.
> -- 
> Gurjeet Singh
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
> 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Ignore hash indices on replicas

2012-07-10 Thread Steven Schlansker
I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have 
access to them.

I understand that the current wisdom is "don't use hash indices", but 
(unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index 
is available.

I assume that fixing the hash index logging issue hasn't been a priority due to 
low interest / technical limitations, but I'm curious for a stopgap measure -- 
can we somehow configure Postgres to ignore hash indices on a replica, using 
other b-tree indices or even a sequential scan?  I know I can do this on a 
per-connection basis by disabling various index lookup methods, but it'd be 
nice if it just ignored invalid indices on its own.

I've not seen much reference to this problem around, but I do apologize if I've 
missed it in the manual or it is extremely obvious how you do this :)

Thanks,
Steven


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication/cloning: rsync vs modification dates?

2012-07-16 Thread Steven Schlansker
I think it's pretty easy to show that timestamp+size isn't good enough to do 
this 100% reliably.

Imagine that your timestamps have a millisecond resolution.  I assume this will 
vary based on OS / filesystem, but the point remains the same no matter what 
size it is.

You can have multiple writes occur in the same quantized "instant".

If the prior rsync just happened to catch the first write (at T+0.1ms) in that 
instant but not the second (which happened at T+0.4ms), the second may not be 
transferred.  But the modification time is the same for the two writes.

All that said, I think the chances of this actually happening is vanishingly 
small.  I personally use rsync without checksums and have had no problems.

On Jul 16, 2012, at 2:42 PM, Chris Angelico wrote:

> On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
>  wrote:
>> On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico  wrote:
>>> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan  wrote:
 As I understand the docs for rsync, it will use both mod time and file size
 if told not to do checksums.
>> 
>> I wonder if it is correct in general to use mtime and size to perform
>> these checks from the point of view of PostgreSQL.
>> 
>> If it works with the current version then is there a guaranty that it
>> will work with the future versions?
> 
> That was my exact question. Ideally, I'd like to hear from someone who
> works with the Postgres internals, but the question may not even be
> possible to answer.
> 
> ChrisA
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp

2012-07-30 Thread Steven Schlansker
It is not possible to compile Postgres contrib/uuid-ossp on the newest release 
of Mac OS X, 10.8

The specific compile error: 

make -C uuid-ossp install

/bin/sh ../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.1.3/lib'
/usr/bin/clang -Os -w -pipe -march=native -Qunused-arguments 
-I/usr/local/Cellar/ossp-uuid/1.6.2/include -Wall -Wmissing-prototypes 
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard  -I. -I. 
-I../../src/include -I/usr/local/Cellar/readline/6.2.2/include 
-I/usr/include/libxml2 -I/usr/include/libxml2   -c -o uuid-ossp.o uuid-ossp.c

In file included from uuid-ossp.c:27:
In file included from /usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:38:
/usr/include/unistd.h:689:26: error: expected identifier
int  gethostuuid(uuid_t, const struct timespec *) 
__OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_2_0);
 ^
In file included from uuid-ossp.c:27:
/usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:94:24: error: typedef 
redefinition with different types ('struct uuid_st' vs '__darwin_uuid_t' (aka 
'unsigned char [16]'))
typedef struct uuid_st uuid_t;
   ^
/usr/include/uuid/uuid.h:42:25: note: previous definition is here
typedef __darwin_uuid_t uuid_t;
^
In file included from uuid-ossp.c:27:
/usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:107:22: error: conflicting 
types for 'uuid_compare'
extern uuid_rc_t uuid_compare  (const uuid_t  *_uuid, const uuid_t *_uuid2, 
int *_result);
 ^
/usr/include/uuid/uuid.h:59:5: note: previous declaration is here
int uuid_compare(const uuid_t uu1, const uuid_t uu2);
^
3 errors generated.
make[2]: *** [uuid-ossp.o] Error 1
make[1]: *** [install-uuid-ossp-recurse] Error 2
make[1]: *** Waiting for unfinished jobs….


Full log available here (not mine, but I have the exact same problem):
https://gist.github.com/2287209

There is a workaround (disable uuid-ossp) but obviously this does not help 
those of us who require UUID support.

There are a number of bugs open:
https://github.com/mxcl/homebrew/issues/13639
https://trac.macports.org/ticket/35153

and a proposed fix which does fix the problem for me:

> This is caused by the inclusion of the system uuid.h in pwd.h. Just add the 
> line :
> 
> #define _XOPEN_SOURCE
> 
> at the first line of uuid-ossp.c, and it should compile fine.


I am not sure what the "correct" fix is, but adding the above definition does 
fix the issue.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp

2012-07-30 Thread Steven Schlansker

On Jul 30, 2012, at 7:35 PM, Tom Lane  wrote:

> Steven Schlansker  writes:
>> It is not possible to compile Postgres contrib/uuid-ossp on the newest 
>> release of Mac OS X, 10.8
> 
> This looks like some variant of the same issue that OSSP's uuid
> package has had on Macs all along, to wit an unreliable method for
> avoiding conflicts with any OS-supplied uuid type.  Back when we
> last looked at this,
> http://archives.postgresql.org/pgsql-hackers/2007-11/msg00551.php
> we concluded that what we had to do was not include 
> ahead of uuid.h, and as far as I can tell we're still not doing that;
> for me, contrib/uuid-ossp still builds fine on Lion with uuid 1.6.2.

Looks right to me.

> 
> So assuming you're using a stock copy of 1.6.2,

(I am)

> that leaves 
> as the only variable in the equation that could have changed.  It's
> not obvious from here exactly how it changed, but in any case this
> fight is ultimately between OSSP uuid and OS X; there's not a lot
> Postgres can (or should) do to fix it.  It's a bit distressing that
> OSSP doesn't seem to have made any new uuid releases since 2008,
> but maybe if you prod them they'll do something about this.
> 
>   regards, tom lane

I've sent a message upstream.  It seems that both their bug tracker and forum 
are either
nonexistent or so broken that they seem so.  So I will work on the assumption 
that this bug
won't get fixed upstream…

Would it be reasonable to include the _XOPEN_SOURCE define in the contrib 
module?
It at least fixes this issue and hopefully wouldn't cause any ill effects.

There's a patch at 
https://github.com/stevenschlansker/homebrew/commit/d86e17cbcc5d287d7a393a6754aa8f94b995c5ea

It's not terribly extensively tested but it does compile and I would be very 
surprised if it caused problems.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ignore hash indices on replicas

2012-08-20 Thread Steven Schlansker

On Aug 19, 2012, at 8:01 PM, Scott Marlowe  wrote:

> On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker  
> wrote:
>> I'm using Postgres hash indices on a streaming replica master.
>> As is documented, hash indices are not logged, so the replica does not have 
>> access to them.
>> 
>> I understand that the current wisdom is "don't use hash indices", but 
>> (unfortunately?) I have benchmarks that
>> show that our particular application is faster by quite a bit when a hash 
>> index is available.
> 
> You could use a slony slave and have different indexes etc between
> master and slave but it's more complex to setup, maintain and monitor
> for most people.

Thanks for the suggestion, but we finally have replication working in a way we 
understand / like and I don't really consider this a viable option.  The 
built-in replication has been treating us very well.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ignore hash indices on replicas

2012-08-20 Thread Steven Schlansker

On Aug 19, 2012, at 2:37 PM, Jeff Davis  wrote:

> On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:
>> I understand that the current wisdom is "don't use hash indices", but
>> (unfortunately?) I have benchmarks that
>> show that our particular application is faster by quite a bit when a
>> hash index is available.
> 
> Can you publish the results somewhere? It might provoke some interest.

I might be able to spend some time looking at making this public, but the 
general parameters are:

122M rows, lookup key is a UUID type.  Lookups are ~1000 random keys at a time 
(as in, a giant SELECT * FROM table WHERE key IN (?,?,?,?,…)

> 
>> I assume that fixing the hash index logging issue hasn't been a
>> priority due to low interest / technical limitations, but I'm curious
>> for a stopgap measure -- can we somehow configure Postgres to ignore
>> hash indices on a replica, using other b-tree indices or even a
>> sequential scan?  I know I can do this on a per-connection basis by
>> disabling various index lookup methods, but it'd be nice if it just
>> ignored invalid indices on its own.
> 
> This might work for you:
> 
> http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Thanks for the link; that looks interesting.  It is a bit unfortunate that I 
would have to find and exclude indices manually, but very doable...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Confirming \timing output

2012-08-23 Thread Steven Schlansker

On Aug 23, 2012, at 11:13 AM, "Gauthier, Dave"  wrote:

> With \timing set on, I run an update statement and it reports
>  
> Time: 0.524 ms
>  
> Is that really 0.524 ms?  As in 524 nanoseconds?

0.524ms = 524000ns

Perhaps you meant microseconds?

0.524ms = 524us

If all your data happens to be in RAM cache, simple queries can execute very 
fast!  Unless you have a reason to believe it's wrong, I would trust it to be 
accurate :-)

>  
> Also, is this wallclock time or some sort of indication of how much cpu it 
> took?
>  
> Thanks for any answers !
>  


\timing measures wall time.  There's a more detailed discussion of the 
difference between this and e.g. EXPLAIN ANALYZE here:

http://postgresql.1045698.n5.nabble.com/What-does-timing-measure-td4289329.html





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general