Re: [GENERAL] Why is this query running slowly?
"Tomas Vondra" writes: > Could the planner do this reasoning, i.e. see if there's a NOT NULL > constraint on the column and change the plan accordingly? Possibly. I'm not sure how expensive it would be. Right now, a structural change of that sort would have to be done pretty early in planning, and we don't have a lot of info collected about the tables at that point. I'm afraid it would involve a special-purpose catalog probe, which would add a lot of overhead. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On 15 Září 2011, 16:26, Tom Lane wrote: > Yang Zhang writes: >> On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra wrote: >>> [ use a left join instead of NOT IN ] > >> This worked great, thank you. Too bad the planner isn't smart enough >> to do this yet! > > It never will be, because they're not equivalent queries. NOT IN has > different (and very hard-to-optimize) behavior in the face of NULLs in > the right-hand table. True, but in this case the subquery selects a PK column (at least I believe so, we haven't seen the structure), so there should be no NULLs. So the result should be the same. Could the planner do this reasoning, i.e. see if there's a NOT NULL constraint on the column and change the plan accordingly? Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
Yang Zhang writes: > On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra wrote: >> [ use a left join instead of NOT IN ] > This worked great, thank you. Too bad the planner isn't smart enough > to do this yet! It never will be, because they're not equivalent queries. NOT IN has different (and very hard-to-optimize) behavior in the face of NULLs in the right-hand table. The moral of the story is avoid NOT IN. You could use NOT EXISTS, which is less of a notational change than recasting as a left join. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On 15 Září 2011, 11:07, Yang Zhang wrote: > On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra wrote: >> On 15 Září 2011, 9:53, Yang Zhang wrote: >>> I have a simple query that's been running for a while, which is fine, >>> but it seems to be running very slowly, which is a problem: >>> >>> mydb=# explain select user_id from den where user_id not in (select >>> duid from user_mappings) and timestamp between '2009-04-01' and >>> '2010-04-01'; >>> >>> QUERY PLAN >>> --- >>> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) >>> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without >>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp >>> without time zone) AND (NOT (SubPlan 1))) >>> SubPlan 1 >>> -> Materialize (cost=711.58..1223.38 rows=36780 width=4) >>> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 >>> width=4) >>> >>> user_mappings is fairly small: >> >> The problem is that for each of the 22634720 rows in "den" a separate >> uncorrelated subquery (a seq scan on user_mappings) has to be executed. >> Althogh the subquery is not very expensive, multiplied by the number of >> rows in "den" the total cost is extreme. >> >> The only solution is to get rid of the "not in" subquery - try to turn >> it >> to a join like this: >> >> SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid) >> WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01') >> AND (duid IS NULL) >> >> That should give the same result I guess. > > This worked great, thank you. Too bad the planner isn't smart enough > to do this yet! It's probably a bit more complicated I guess - there are probably cases when this would not work this great. Anyway, you should consider Toby Corkindale's recommendation and check if there's an index on that timestamp column - that might further improve the performance. (I don't think an index on user_id might be useful in this case). Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra wrote: > On 15 Září 2011, 9:53, Yang Zhang wrote: >> I have a simple query that's been running for a while, which is fine, >> but it seems to be running very slowly, which is a problem: >> >> mydb=# explain select user_id from den where user_id not in (select >> duid from user_mappings) and timestamp between '2009-04-01' and >> '2010-04-01'; >> >> QUERY PLAN >> --- >> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) >> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without >> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp >> without time zone) AND (NOT (SubPlan 1))) >> SubPlan 1 >> -> Materialize (cost=711.58..1223.38 rows=36780 width=4) >> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 >> width=4) >> >> user_mappings is fairly small: > > The problem is that for each of the 22634720 rows in "den" a separate > uncorrelated subquery (a seq scan on user_mappings) has to be executed. > Althogh the subquery is not very expensive, multiplied by the number of > rows in "den" the total cost is extreme. > > The only solution is to get rid of the "not in" subquery - try to turn it > to a join like this: > > SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid) > WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01') > AND (duid IS NULL) > > That should give the same result I guess. This worked great, thank you. Too bad the planner isn't smart enough to do this yet! > > Tomas > > -- Yang Zhang http://yz.mit.edu/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On 15/09/11 17:53, Yang Zhang wrote: I have a simple query that's been running for a while, which is fine, but it seems to be running very slowly, which is a problem: mydb=# explain select user_id from den where user_id not in (select duid from user_mappings) and timestamp between '2009-04-01' and '2010-04-01'; Do you have indexes on user_id and timestamp? QUERY PLAN --- Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) Filter: (("timestamp">= '2009-04-01 00:00:00'::timestamp without time zone) AND ("timestamp"<= '2010-04-01 00:00:00'::timestamp without time zone) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=711.58..1223.38 rows=36780 width=4) -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 width=4) user_mappings is fairly small: mydb=# select count(*) from user_mappings; count --- 36780 (1 row) I know that `den` is large but things seem to be running much slower than I'd expect. Nothing in the logs. Is there any way to inspect what's going on? Hesitant to kill the query in case it's almost done, though I doubt it (didn't have the foresight to expose this information in the client process - wasn't expecting to run into this). Thanks in advance. -- .signature -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On 15 Září 2011, 9:53, Yang Zhang wrote: > I have a simple query that's been running for a while, which is fine, > but it seems to be running very slowly, which is a problem: > > mydb=# explain select user_id from den where user_id not in (select > duid from user_mappings) and timestamp between '2009-04-01' and > '2010-04-01'; > >QUERY PLAN > --- > Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) >Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without > time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp > without time zone) AND (NOT (SubPlan 1))) >SubPlan 1 > -> Materialize (cost=711.58..1223.38 rows=36780 width=4) >-> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 > width=4) > > user_mappings is fairly small: The problem is that for each of the 22634720 rows in "den" a separate uncorrelated subquery (a seq scan on user_mappings) has to be executed. Althogh the subquery is not very expensive, multiplied by the number of rows in "den" the total cost is extreme. The only solution is to get rid of the "not in" subquery - try to turn it to a join like this: SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid) WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01') AND (duid IS NULL) That should give the same result I guess. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
On 09/15/11 12:53 AM, Yang Zhang wrote: mydb=# explain select user_id from den where user_id not in (select duid from user_mappings) and timestamp between '2009-04-01' and '2010-04-01'; QUERY PLAN --- Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) Filter: (("timestamp">= '2009-04-01 00:00:00'::timestamp without time zone) AND ("timestamp"<= '2010-04-01 00:00:00'::timestamp without time zone) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=711.58..1223.38 rows=36780 width=4) -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 width=4) user_mappings is fairly small: so you have to check if every one of the 22 million rows of 'den' in that time range and is not one of the 36000 rows of user_mappings? how many rows are in that time range? are timestamp and duid indexed? have the tables been analyzed any time recently? -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query running slowly?
Should also add that while the client is under no load (CPU<1%, load ~0.1, mem ~20%), the server looks pretty busy (CPU 90-100% of 1 core, load ~1.5, mem ~70-80%), but PG is the only thing using resources. On Thu, Sep 15, 2011 at 12:58 AM, Yang Zhang wrote: > If it matters, the client is a JDBC app doing: > > con = DriverManager.getConnection("jdbc:postgresql://localhost:5434/mydb", > "yang", password) > con.setAutoCommit(false) > val st = con.prepareStatement(""" > select user_id from den where user_id not in (select duid from > user_mappings) > and timestamp between '2009-04-01'::date and '2010-04-01'::date; > """, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY) > st.setFetchSize(8192) > rs = st.executeQuery() > > On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang wrote: >> I have a simple query that's been running for a while, which is fine, >> but it seems to be running very slowly, which is a problem: >> >> mydb=# explain select user_id from den where user_id not in (select >> duid from user_mappings) and timestamp between '2009-04-01' and >> '2010-04-01'; >> >> QUERY PLAN >> --- >> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) >> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without >> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp >> without time zone) AND (NOT (SubPlan 1))) >> SubPlan 1 >> -> Materialize (cost=711.58..1223.38 rows=36780 width=4) >> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 >> width=4) >> >> user_mappings is fairly small: >> >> mydb=# select count(*) from user_mappings; >> count >> --- >> 36780 >> (1 row) >> >> The client is on the same LAN, but only sees packets trickling in: >> >> $ sudo tcpdump -i lo port 5434 >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes >> 00:44:05.475584 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS >> val 194791416 ecr 194791044], length 8192 >> 00:44:05.475684 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416], >> length 0 >> 00:44:08.867976 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr >> 194791416], length 7587 >> 00:44:08.868019 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756], >> length 0 >> 00:44:08.870336 IP nuthouse.40349 > nuthouse.5434: Flags [P.], seq >> 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr >> 194791756], length 18 >> 00:44:08.870343 IP nuthouse.5434 > nuthouse.40349: Flags [.], ack 19, >> win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0 >> 00:44:12.578220 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr >> 194791756], length 8192 >> 00:44:12.618116 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127], >> length 0 >> 00:44:16.664645 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr >> 194792131], length 8192 >> 00:44:16.664755 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535], >> length 0 >> 00:44:20.465773 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr >> 194792535], length 8192 >> 00:44:20.465878 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915], >> length 0 >> 00:44:24.115273 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr >> 194792915], length 8192 >> 00:44:24.115380 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280], >> length 0 >> 00:44:27.749084 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr >> 194793280], length 8192 >> 00:44:27.749192 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644], >> length 0 >> 00:44:31.618124 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq >> 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr >> 194793644], length 8192 >> 00:44:31.618166 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack >> 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031], >> length 0 >> 00
Re: [GENERAL] Why is this query running slowly?
If it matters, the client is a JDBC app doing: con = DriverManager.getConnection("jdbc:postgresql://localhost:5434/mydb", "yang", password) con.setAutoCommit(false) val st = con.prepareStatement(""" select user_id from den where user_id not in (select duid from user_mappings) and timestamp between '2009-04-01'::date and '2010-04-01'::date; """, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY) st.setFetchSize(8192) rs = st.executeQuery() On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang wrote: > I have a simple query that's been running for a while, which is fine, > but it seems to be running very slowly, which is a problem: > > mydb=# explain select user_id from den where user_id not in (select > duid from user_mappings) and timestamp between '2009-04-01' and > '2010-04-01'; > > QUERY PLAN > --- > Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4) > Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without > time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp > without time zone) AND (NOT (SubPlan 1))) > SubPlan 1 > -> Materialize (cost=711.58..1223.38 rows=36780 width=4) > -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780 > width=4) > > user_mappings is fairly small: > > mydb=# select count(*) from user_mappings; > count > --- > 36780 > (1 row) > > The client is on the same LAN, but only sees packets trickling in: > > $ sudo tcpdump -i lo port 5434 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes > 00:44:05.475584 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS > val 194791416 ecr 194791044], length 8192 > 00:44:05.475684 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416], > length 0 > 00:44:08.867976 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr > 194791416], length 7587 > 00:44:08.868019 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756], > length 0 > 00:44:08.870336 IP nuthouse.40349 > nuthouse.5434: Flags [P.], seq > 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr > 194791756], length 18 > 00:44:08.870343 IP nuthouse.5434 > nuthouse.40349: Flags [.], ack 19, > win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0 > 00:44:12.578220 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr > 194791756], length 8192 > 00:44:12.618116 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127], > length 0 > 00:44:16.664645 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr > 194792131], length 8192 > 00:44:16.664755 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535], > length 0 > 00:44:20.465773 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr > 194792535], length 8192 > 00:44:20.465878 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915], > length 0 > 00:44:24.115273 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr > 194792915], length 8192 > 00:44:24.115380 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280], > length 0 > 00:44:27.749084 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr > 194793280], length 8192 > 00:44:27.749192 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644], > length 0 > 00:44:31.618124 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr > 194793644], length 8192 > 00:44:31.618166 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031], > length 0 > 00:44:35.349958 IP nuthouse.5434 > nuthouse.40349: Flags [P.], seq > 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr > 194794031], length 8192 > 00:44:35.350054 IP nuthouse.40349 > nuthouse.5434: Flags [.], ack > 73123, win 720, options [nop,nop,TS val 194794404 ecr 194794404], > length 0 > 00:44:39.311627 IP nuthouse.5434 > nuthouse.40349: Fl