Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tom Lane
"Tomas Vondra"  writes:
> Could the planner do this reasoning, i.e. see if there's a NOT NULL
> constraint on the column and change the plan accordingly?

Possibly.  I'm not sure how expensive it would be.  Right now, a
structural change of that sort would have to be done pretty early in
planning, and we don't have a lot of info collected about the tables
at that point.  I'm afraid it would involve a special-purpose catalog
probe, which would add a lot of overhead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 16:26, Tom Lane wrote:
> Yang Zhang  writes:
>> On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra  wrote:
>>> [ use a left join instead of NOT IN ]
>
>> This worked great, thank you.  Too bad the planner isn't smart enough
>> to do this yet!
>
> It never will be, because they're not equivalent queries.  NOT IN has
> different (and very hard-to-optimize) behavior in the face of NULLs in
> the right-hand table.

True, but in this case the subquery selects a PK column (at least I
believe so, we haven't seen the structure), so there should be no NULLs.
So the result should be the same.

Could the planner do this reasoning, i.e. see if there's a NOT NULL
constraint on the column and change the plan accordingly?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tom Lane
Yang Zhang  writes:
> On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra  wrote:
>> [ use a left join instead of NOT IN ]

> This worked great, thank you.  Too bad the planner isn't smart enough
> to do this yet!

It never will be, because they're not equivalent queries.  NOT IN has
different (and very hard-to-optimize) behavior in the face of NULLs in
the right-hand table.

The moral of the story is avoid NOT IN.  You could use NOT EXISTS,
which is less of a notational change than recasting as a left join.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 11:07, Yang Zhang wrote:
> On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra  wrote:
>> On 15 Září 2011, 9:53, Yang Zhang wrote:
>>> I have a simple query that's been running for a while, which is fine,
>>> but it seems to be running very slowly, which is a problem:
>>>
>>> mydb=# explain select user_id from den where user_id not in (select
>>> duid from user_mappings) and timestamp between '2009-04-01' and
>>> '2010-04-01';
>>>
>>>            QUERY PLAN
>>> ---
>>>  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
>>>    Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
>>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
>>> without time zone) AND (NOT (SubPlan 1)))
>>>    SubPlan 1
>>>      ->  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
>>>            ->  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
>>> width=4)
>>>
>>> user_mappings is fairly small:
>>
>> The problem is that for each of the 22634720 rows in "den" a separate
>> uncorrelated subquery (a seq scan on user_mappings) has to be executed.
>> Althogh the subquery is not very expensive, multiplied by the number of
>> rows in "den" the total cost is extreme.
>>
>> The only solution is to get rid of the "not in" subquery - try to turn
>> it
>> to a join like this:
>>
>> SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
>> WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
>>  AND (duid IS NULL)
>>
>> That should give the same result I guess.
>
> This worked great, thank you.  Too bad the planner isn't smart enough
> to do this yet!

It's probably a bit more complicated I guess - there are probably cases
when this would not work this great.

Anyway, you should consider Toby Corkindale's recommendation and check if
there's an index on that timestamp column - that might further improve the
performance. (I don't think an index on user_id might be useful in this
case).

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra  wrote:
> On 15 Září 2011, 9:53, Yang Zhang wrote:
>> I have a simple query that's been running for a while, which is fine,
>> but it seems to be running very slowly, which is a problem:
>>
>> mydb=# explain select user_id from den where user_id not in (select
>> duid from user_mappings) and timestamp between '2009-04-01' and
>> '2010-04-01';
>>
>>            QUERY PLAN
>> ---
>>  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
>>    Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
>> without time zone) AND (NOT (SubPlan 1)))
>>    SubPlan 1
>>      ->  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
>>            ->  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
>> width=4)
>>
>> user_mappings is fairly small:
>
> The problem is that for each of the 22634720 rows in "den" a separate
> uncorrelated subquery (a seq scan on user_mappings) has to be executed.
> Althogh the subquery is not very expensive, multiplied by the number of
> rows in "den" the total cost is extreme.
>
> The only solution is to get rid of the "not in" subquery - try to turn it
> to a join like this:
>
> SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
> WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
>  AND (duid IS NULL)
>
> That should give the same result I guess.

This worked great, thank you.  Too bad the planner isn't smart enough
to do this yet!

>
> Tomas
>
>



-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Toby Corkindale

On 15/09/11 17:53, Yang Zhang wrote:

I have a simple query that's been running for a while, which is fine,
but it seems to be running very slowly, which is a problem:

mydb=# explain select user_id from den where user_id not in (select
duid from user_mappings) and timestamp between '2009-04-01' and
'2010-04-01';


Do you have indexes on user_id and timestamp?



QUERY PLAN
---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
Filter: (("timestamp">= '2009-04-01 00:00:00'::timestamp without
time zone) AND ("timestamp"<= '2010-04-01 00:00:00'::timestamp
without time zone) AND (NOT (SubPlan 1)))
SubPlan 1
  ->   Materialize  (cost=711.58..1223.38 rows=36780 width=4)
->   Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
width=4)

user_mappings is fairly small:

mydb=# select count(*) from user_mappings;
  count
---
  36780
(1 row)


I know that `den` is large but things seem to be running much slower
than I'd expect.  Nothing in the logs.  Is there any way to inspect
what's going on?  Hesitant to kill the query in case it's almost done,
though I doubt it (didn't have the foresight to expose this
information in the client process - wasn't expecting to run into
this).  Thanks in advance.




--
.signature

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 9:53, Yang Zhang wrote:
> I have a simple query that's been running for a while, which is fine,
> but it seems to be running very slowly, which is a problem:
>
> mydb=# explain select user_id from den where user_id not in (select
> duid from user_mappings) and timestamp between '2009-04-01' and
> '2010-04-01';
>
>QUERY PLAN
> ---
>  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
>Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
> without time zone) AND (NOT (SubPlan 1)))
>SubPlan 1
>  ->  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
>->  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
> width=4)
>
> user_mappings is fairly small:

The problem is that for each of the 22634720 rows in "den" a separate
uncorrelated subquery (a seq scan on user_mappings) has to be executed.
Althogh the subquery is not very expensive, multiplied by the number of
rows in "den" the total cost is extreme.

The only solution is to get rid of the "not in" subquery - try to turn it
to a join like this:

SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
  AND (duid IS NULL)

That should give the same result I guess.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread John R Pierce

On 09/15/11 12:53 AM, Yang Zhang wrote:

mydb=# explain select user_id from den where user_id not in (select
duid from user_mappings) and timestamp between '2009-04-01' and
'2010-04-01';

QUERY PLAN
---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
Filter: (("timestamp">= '2009-04-01 00:00:00'::timestamp without
time zone) AND ("timestamp"<= '2010-04-01 00:00:00'::timestamp
without time zone) AND (NOT (SubPlan 1)))
SubPlan 1
  ->   Materialize  (cost=711.58..1223.38 rows=36780 width=4)
->   Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
width=4)

user_mappings is fairly small:


so you have to check if every one of the 22 million rows of 'den' in 
that time range and is not one of the 36000 rows of user_mappings?


how many rows are in that time range?

are timestamp and duid indexed?  have the tables been analyzed any time 
recently?



--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
Should also add that while the client is under no load (CPU<1%, load
~0.1, mem ~20%), the server looks pretty busy (CPU 90-100% of 1 core,
load ~1.5, mem ~70-80%), but PG is the only thing using resources.

On Thu, Sep 15, 2011 at 12:58 AM, Yang Zhang  wrote:
> If it matters, the client is a JDBC app doing:
>
>    con = DriverManager.getConnection("jdbc:postgresql://localhost:5434/mydb",
> "yang", password)
>    con.setAutoCommit(false)
>    val st = con.prepareStatement("""
>      select user_id from den where user_id not in (select duid from
> user_mappings)
>      and timestamp between '2009-04-01'::date and '2010-04-01'::date;
>    """, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
>    st.setFetchSize(8192)
>    rs = st.executeQuery()
>
> On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang  wrote:
>> I have a simple query that's been running for a while, which is fine,
>> but it seems to be running very slowly, which is a problem:
>>
>> mydb=# explain select user_id from den where user_id not in (select
>> duid from user_mappings) and timestamp between '2009-04-01' and
>> '2010-04-01';
>>
>>           QUERY PLAN
>> ---
>>  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
>>   Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
>> without time zone) AND (NOT (SubPlan 1)))
>>   SubPlan 1
>>     ->  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
>>           ->  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
>> width=4)
>>
>> user_mappings is fairly small:
>>
>> mydb=# select count(*) from user_mappings;
>>  count
>> ---
>>  36780
>> (1 row)
>>
>> The client is on the same LAN, but only sees packets trickling in:
>>
>> $ sudo tcpdump -i lo port 5434
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
>> 00:44:05.475584 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
>> val 194791416 ecr 194791044], length 8192
>> 00:44:05.475684 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
>> length 0
>> 00:44:08.867976 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
>> 194791416], length 7587
>> 00:44:08.868019 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
>> length 0
>> 00:44:08.870336 IP nuthouse.40349 > nuthouse.5434: Flags [P.], seq
>> 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
>> 194791756], length 18
>> 00:44:08.870343 IP nuthouse.5434 > nuthouse.40349: Flags [.], ack 19,
>> win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
>> 00:44:12.578220 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
>> 194791756], length 8192
>> 00:44:12.618116 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
>> length 0
>> 00:44:16.664645 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
>> 194792131], length 8192
>> 00:44:16.664755 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
>> length 0
>> 00:44:20.465773 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
>> 194792535], length 8192
>> 00:44:20.465878 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
>> length 0
>> 00:44:24.115273 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
>> 194792915], length 8192
>> 00:44:24.115380 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
>> length 0
>> 00:44:27.749084 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
>> 194793280], length 8192
>> 00:44:27.749192 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
>> length 0
>> 00:44:31.618124 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
>> 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
>> 194793644], length 8192
>> 00:44:31.618166 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
>> 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
>> length 0
>> 00

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
If it matters, the client is a JDBC app doing:

con = DriverManager.getConnection("jdbc:postgresql://localhost:5434/mydb",
"yang", password)
con.setAutoCommit(false)
val st = con.prepareStatement("""
  select user_id from den where user_id not in (select duid from
user_mappings)
  and timestamp between '2009-04-01'::date and '2010-04-01'::date;
""", ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
st.setFetchSize(8192)
rs = st.executeQuery()

On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang  wrote:
> I have a simple query that's been running for a while, which is fine,
> but it seems to be running very slowly, which is a problem:
>
> mydb=# explain select user_id from den where user_id not in (select
> duid from user_mappings) and timestamp between '2009-04-01' and
> '2010-04-01';
>
>           QUERY PLAN
> ---
>  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
>   Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
> without time zone) AND (NOT (SubPlan 1)))
>   SubPlan 1
>     ->  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
>           ->  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
> width=4)
>
> user_mappings is fairly small:
>
> mydb=# select count(*) from user_mappings;
>  count
> ---
>  36780
> (1 row)
>
> The client is on the same LAN, but only sees packets trickling in:
>
> $ sudo tcpdump -i lo port 5434
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:44:05.475584 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
> val 194791416 ecr 194791044], length 8192
> 00:44:05.475684 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
> length 0
> 00:44:08.867976 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
> 194791416], length 7587
> 00:44:08.868019 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
> length 0
> 00:44:08.870336 IP nuthouse.40349 > nuthouse.5434: Flags [P.], seq
> 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
> 194791756], length 18
> 00:44:08.870343 IP nuthouse.5434 > nuthouse.40349: Flags [.], ack 19,
> win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
> 00:44:12.578220 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
> 194791756], length 8192
> 00:44:12.618116 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
> length 0
> 00:44:16.664645 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
> 194792131], length 8192
> 00:44:16.664755 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
> length 0
> 00:44:20.465773 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
> 194792535], length 8192
> 00:44:20.465878 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
> length 0
> 00:44:24.115273 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
> 194792915], length 8192
> 00:44:24.115380 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
> length 0
> 00:44:27.749084 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
> 194793280], length 8192
> 00:44:27.749192 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
> length 0
> 00:44:31.618124 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
> 194793644], length 8192
> 00:44:31.618166 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
> length 0
> 00:44:35.349958 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq
> 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
> 194794031], length 8192
> 00:44:35.350054 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack
> 73123, win 720, options [nop,nop,TS val 194794404 ecr 194794404],
> length 0
> 00:44:39.311627 IP nuthouse.5434 > nuthouse.40349: Fl