[GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
I have a simple query that's been running for a while, which is fine,
but it seems to be running very slowly, which is a problem:

mydb=# explain select user_id from den where user_id not in (select
duid from user_mappings) and timestamp between '2009-04-01' and
'2010-04-01';

   QUERY PLAN
---
 Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
   Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
without time zone) AND (NOT (SubPlan 1)))
   SubPlan 1
 -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
   -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 width=4)

user_mappings is fairly small:

mydb=# select count(*) from user_mappings;
 count
---
 36780
(1 row)

The client is on the same LAN, but only sees packets trickling in:

$ sudo tcpdump -i lo port 5434
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
00:44:05.475584 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
val 194791416 ecr 194791044], length 8192
00:44:05.475684 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
length 0
00:44:08.867976 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
194791416], length 7587
00:44:08.868019 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
length 0
00:44:08.870336 IP nuthouse.40349  nuthouse.5434: Flags [P.], seq
1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
194791756], length 18
00:44:08.870343 IP nuthouse.5434  nuthouse.40349: Flags [.], ack 19,
win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
00:44:12.578220 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
194791756], length 8192
00:44:12.618116 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
length 0
00:44:16.664645 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
194792131], length 8192
00:44:16.664755 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
length 0
00:44:20.465773 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
194792535], length 8192
00:44:20.465878 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
length 0
00:44:24.115273 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
194792915], length 8192
00:44:24.115380 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
length 0
00:44:27.749084 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
194793280], length 8192
00:44:27.749192 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
length 0
00:44:31.618124 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
194793644], length 8192
00:44:31.618166 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
length 0
00:44:35.349958 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
194794031], length 8192
00:44:35.350054 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
73123, win 720, options [nop,nop,TS val 194794404 ecr 194794404],
length 0
00:44:39.311627 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
73123:81315, ack 19, win 265, options [nop,nop,TS val 194794800 ecr
194794404], length 8192
00:44:39.311734 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
81315, win 771, options [nop,nop,TS val 194794800 ecr 194794800],
length 0
00:44:43.178319 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
81315:89507, ack 19, win 265, options [nop,nop,TS val 194795187 ecr
194794800], length 8192
00:44:43.178429 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
89507, win 771, options [nop,nop,TS val 194795187 ecr 194795187],
length 0
00:44:46.798511 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
89507:97699, ack 19, win 265, options [nop,nop,TS val 194795549 ecr
194795187], length 8192
00:44:46.798622 IP nuthouse.40349  

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
If it matters, the client is a JDBC app doing:

con = DriverManager.getConnection(jdbc:postgresql://localhost:5434/mydb,
yang, password)
con.setAutoCommit(false)
val st = con.prepareStatement(
  select user_id from den where user_id not in (select duid from
user_mappings)
  and timestamp between '2009-04-01'::date and '2010-04-01'::date;
, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
st.setFetchSize(8192)
rs = st.executeQuery()

On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

           QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
   Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
   SubPlan 1
     -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
           -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
 width=4)

 user_mappings is fairly small:

 mydb=# select count(*) from user_mappings;
  count
 ---
  36780
 (1 row)

 The client is on the same LAN, but only sees packets trickling in:

 $ sudo tcpdump -i lo port 5434
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
 00:44:05.475584 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
 val 194791416 ecr 194791044], length 8192
 00:44:05.475684 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
 length 0
 00:44:08.867976 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
 194791416], length 7587
 00:44:08.868019 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
 length 0
 00:44:08.870336 IP nuthouse.40349  nuthouse.5434: Flags [P.], seq
 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
 194791756], length 18
 00:44:08.870343 IP nuthouse.5434  nuthouse.40349: Flags [.], ack 19,
 win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
 00:44:12.578220 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
 194791756], length 8192
 00:44:12.618116 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
 length 0
 00:44:16.664645 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
 194792131], length 8192
 00:44:16.664755 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
 length 0
 00:44:20.465773 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
 194792535], length 8192
 00:44:20.465878 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
 length 0
 00:44:24.115273 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
 194792915], length 8192
 00:44:24.115380 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
 length 0
 00:44:27.749084 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
 194793280], length 8192
 00:44:27.749192 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
 length 0
 00:44:31.618124 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
 194793644], length 8192
 00:44:31.618166 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
 length 0
 00:44:35.349958 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
 194794031], length 8192
 00:44:35.350054 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 73123, win 720, options [nop,nop,TS val 194794404 ecr 194794404],
 length 0
 00:44:39.311627 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 73123:81315, ack 19, win 265, options [nop,nop,TS val 194794800 ecr
 194794404], length 8192

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
Should also add that while the client is under no load (CPU1%, load
~0.1, mem ~20%), the server looks pretty busy (CPU 90-100% of 1 core,
load ~1.5, mem ~70-80%), but PG is the only thing using resources.

On Thu, Sep 15, 2011 at 12:58 AM, Yang Zhang yanghates...@gmail.com wrote:
 If it matters, the client is a JDBC app doing:

    con = DriverManager.getConnection(jdbc:postgresql://localhost:5434/mydb,
 yang, password)
    con.setAutoCommit(false)
    val st = con.prepareStatement(
      select user_id from den where user_id not in (select duid from
 user_mappings)
      and timestamp between '2009-04-01'::date and '2010-04-01'::date;
    , ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
    st.setFetchSize(8192)
    rs = st.executeQuery()

 On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

           QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
   Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
   SubPlan 1
     -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
           -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
 width=4)

 user_mappings is fairly small:

 mydb=# select count(*) from user_mappings;
  count
 ---
  36780
 (1 row)

 The client is on the same LAN, but only sees packets trickling in:

 $ sudo tcpdump -i lo port 5434
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
 00:44:05.475584 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
 val 194791416 ecr 194791044], length 8192
 00:44:05.475684 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
 length 0
 00:44:08.867976 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
 194791416], length 7587
 00:44:08.868019 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
 length 0
 00:44:08.870336 IP nuthouse.40349  nuthouse.5434: Flags [P.], seq
 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
 194791756], length 18
 00:44:08.870343 IP nuthouse.5434  nuthouse.40349: Flags [.], ack 19,
 win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
 00:44:12.578220 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
 194791756], length 8192
 00:44:12.618116 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
 length 0
 00:44:16.664645 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
 194792131], length 8192
 00:44:16.664755 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
 length 0
 00:44:20.465773 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
 194792535], length 8192
 00:44:20.465878 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
 length 0
 00:44:24.115273 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
 194792915], length 8192
 00:44:24.115380 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
 length 0
 00:44:27.749084 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
 194793280], length 8192
 00:44:27.749192 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
 length 0
 00:44:31.618124 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
 194793644], length 8192
 00:44:31.618166 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
 length 0
 00:44:35.349958 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
 194794031], length 8192
 00:44:35.350054 

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread John R Pierce

On 09/15/11 12:53 AM, Yang Zhang wrote:

mydb=# explain select user_id from den where user_id not in (select
duid from user_mappings) and timestamp between '2009-04-01' and
'2010-04-01';

QUERY PLAN
---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
Filter: ((timestamp= '2009-04-01 00:00:00'::timestamp without
time zone) AND (timestamp= '2010-04-01 00:00:00'::timestamp
without time zone) AND (NOT (SubPlan 1)))
SubPlan 1
  -   Materialize  (cost=711.58..1223.38 rows=36780 width=4)
-   Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
width=4)

user_mappings is fairly small:


so you have to check if every one of the 22 million rows of 'den' in 
that time range and is not one of the 36000 rows of user_mappings?


how many rows are in that time range?

are timestamp and duid indexed?  have the tables been analyzed any time 
recently?



--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 9:53, Yang Zhang wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
SubPlan 1
  -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
-  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
 width=4)

 user_mappings is fairly small:

The problem is that for each of the 22634720 rows in den a separate
uncorrelated subquery (a seq scan on user_mappings) has to be executed.
Althogh the subquery is not very expensive, multiplied by the number of
rows in den the total cost is extreme.

The only solution is to get rid of the not in subquery - try to turn it
to a join like this:

SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
  AND (duid IS NULL)

That should give the same result I guess.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Toby Corkindale

On 15/09/11 17:53, Yang Zhang wrote:

I have a simple query that's been running for a while, which is fine,
but it seems to be running very slowly, which is a problem:

mydb=# explain select user_id from den where user_id not in (select
duid from user_mappings) and timestamp between '2009-04-01' and
'2010-04-01';


Do you have indexes on user_id and timestamp?



QUERY PLAN
---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
Filter: ((timestamp= '2009-04-01 00:00:00'::timestamp without
time zone) AND (timestamp= '2010-04-01 00:00:00'::timestamp
without time zone) AND (NOT (SubPlan 1)))
SubPlan 1
  -   Materialize  (cost=711.58..1223.38 rows=36780 width=4)
-   Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
width=4)

user_mappings is fairly small:

mydb=# select count(*) from user_mappings;
  count
---
  36780
(1 row)


I know that `den` is large but things seem to be running much slower
than I'd expect.  Nothing in the logs.  Is there any way to inspect
what's going on?  Hesitant to kill the query in case it's almost done,
though I doubt it (didn't have the foresight to expose this
information in the client process - wasn't expecting to run into
this).  Thanks in advance.




--
.signature

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang
On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 15 Září 2011, 9:53, Yang Zhang wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

            QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
    Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
    SubPlan 1
      -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
            -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
 width=4)

 user_mappings is fairly small:

 The problem is that for each of the 22634720 rows in den a separate
 uncorrelated subquery (a seq scan on user_mappings) has to be executed.
 Althogh the subquery is not very expensive, multiplied by the number of
 rows in den the total cost is extreme.

 The only solution is to get rid of the not in subquery - try to turn it
 to a join like this:

 SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
 WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
  AND (duid IS NULL)

 That should give the same result I guess.

This worked great, thank you.  Too bad the planner isn't smart enough
to do this yet!


 Tomas





-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 11:07, Yang Zhang wrote:
 On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 15 Září 2011, 9:53, Yang Zhang wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

            QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
    Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
    SubPlan 1
      -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
            -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
 width=4)

 user_mappings is fairly small:

 The problem is that for each of the 22634720 rows in den a separate
 uncorrelated subquery (a seq scan on user_mappings) has to be executed.
 Althogh the subquery is not very expensive, multiplied by the number of
 rows in den the total cost is extreme.

 The only solution is to get rid of the not in subquery - try to turn
 it
 to a join like this:

 SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
 WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
  AND (duid IS NULL)

 That should give the same result I guess.

 This worked great, thank you.  Too bad the planner isn't smart enough
 to do this yet!

It's probably a bit more complicated I guess - there are probably cases
when this would not work this great.

Anyway, you should consider Toby Corkindale's recommendation and check if
there's an index on that timestamp column - that might further improve the
performance. (I don't think an index on user_id might be useful in this
case).

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 [ use a left join instead of NOT IN ]

 This worked great, thank you.  Too bad the planner isn't smart enough
 to do this yet!

It never will be, because they're not equivalent queries.  NOT IN has
different (and very hard-to-optimize) behavior in the face of NULLs in
the right-hand table.

The moral of the story is avoid NOT IN.  You could use NOT EXISTS,
which is less of a notational change than recasting as a left join.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tomas Vondra
On 15 Září 2011, 16:26, Tom Lane wrote:
 Yang Zhang yanghates...@gmail.com writes:
 On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 [ use a left join instead of NOT IN ]

 This worked great, thank you.  Too bad the planner isn't smart enough
 to do this yet!

 It never will be, because they're not equivalent queries.  NOT IN has
 different (and very hard-to-optimize) behavior in the face of NULLs in
 the right-hand table.

True, but in this case the subquery selects a PK column (at least I
believe so, we haven't seen the structure), so there should be no NULLs.
So the result should be the same.

Could the planner do this reasoning, i.e. see if there's a NOT NULL
constraint on the column and change the plan accordingly?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 Could the planner do this reasoning, i.e. see if there's a NOT NULL
 constraint on the column and change the plan accordingly?

Possibly.  I'm not sure how expensive it would be.  Right now, a
structural change of that sort would have to be done pretty early in
planning, and we don't have a lot of info collected about the tables
at that point.  I'm afraid it would involve a special-purpose catalog
probe, which would add a lot of overhead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general