[SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Richard Jones
I've been profiling a PG database / mix of applications and found that
one statement which takes a very long time to execute is:

select e.keywordid, e.quantity, e.max_cpc, i.position
 from bid3_events_impressions i, bid3_events e
where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,
$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,
$45,$46,$47,$48,
[... placeholders $49 thru $1908 omitted ...]
$1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1917,$1918,$1919,$1920,
$1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1929,$1930,$1931,$1932)
order by e.keywordid, e.creativeid, e.t

Needless to say this statement is being generated programatically.

The problem is that the code needs to execute selects of this sort on
various different number of keyword IDs quite frequently.  I'm not
sure how to rewrite it.  If I put the keyword IDs into a temporary
table then it is not at all clear that the overhead of doing each
individual INSERT to populate the table won't be just as slow (the
database is located across a network so there is a significant RTT,
and COPY isn't supported by my PG lib).

Has anyone got any suggestions?

Rich.

--
explain select e.keywordid, e.quantity, e.max_cpc, i.position from 
bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid 
in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t;  

  QUERY PLAN
 
---
 Sort  (cost=15795.79..15796.57 rows=312 width=34)
   Sort Key: e.keywordid, e.creativeid, e.t
   ->  Hash Join  (cost=11623.58..15782.87 rows=312 width=34)
 Hash Cond: ("outer".eventid = "inner".id)
 ->  Seq Scan on bid3_events_impressions i  (cost=0.00..3471.78 
rows=136878 width=8)
 ->  Hash  (cost=11622.35..11622.35 rows=489 width=34)
   ->  Seq Scan on bid3_events e  (cost=0.00..11622.35 rows=489 
width=34)
 Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid 
= 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 
7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
(8 rows)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Oleg Bartunov

Richard,

contrib/intarray may help you.

On Sun, 25 Feb 2007, Richard Jones wrote:


I've been profiling a PG database / mix of applications and found that
one statement which takes a very long time to execute is:

select e.keywordid, e.quantity, e.max_cpc, i.position
from bid3_events_impressions i, bid3_events e
   where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,
$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,
$45,$46,$47,$48,
[... placeholders $49 thru $1908 omitted ...]
$1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1917,$1918,$1919,$1920,
$1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1929,$1930,$1931,$1932)
   order by e.keywordid, e.creativeid, e.t

Needless to say this statement is being generated programatically.

The problem is that the code needs to execute selects of this sort on
various different number of keyword IDs quite frequently.  I'm not
sure how to rewrite it.  If I put the keyword IDs into a temporary
table then it is not at all clear that the overhead of doing each
individual INSERT to populate the table won't be just as slow (the
database is located across a network so there is a significant RTT,
and COPY isn't supported by my PG lib).

Has anyone got any suggestions?

Rich.

--
explain select e.keywordid, e.quantity, e.max_cpc, i.position from 
bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid 
in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t;  

  QUERY PLAN
---
Sort  (cost=15795.79..15796.57 rows=312 width=34)
  Sort Key: e.keywordid, e.creativeid, e.t
  ->  Hash Join  (cost=11623.58..15782.87 rows=312 width=34)
Hash Cond: ("outer".eventid = "inner".id)
->  Seq Scan on bid3_events_impressions i  (cost=0.00..3471.78 
rows=136878 width=8)
->  Hash  (cost=11622.35..11622.35 rows=489 width=34)
  ->  Seq Scan on bid3_events e  (cost=0.00..11622.35 rows=489 
width=34)
Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid = 
3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 7) 
OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
(8 rows)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Tom Lane
Richard Jones <[EMAIL PROTECTED]> writes:
> I've been profiling a PG database / mix of applications and found that
> one statement which takes a very long time to execute is:

PG 8.2 does better with long IN-lists ... although if the list is so
long as to be fetching a significant fraction of the table, you'll still
have problems.  In that case I'd advise putting the values into a temp
table, ANALYZEing same, and doing "WHERE foo IN (SELECT x FROM tmp_table)".

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Joe Conway

Tom Lane wrote:

Richard Jones <[EMAIL PROTECTED]> writes:

I've been profiling a PG database / mix of applications and found that
one statement which takes a very long time to execute is:


PG 8.2 does better with long IN-lists ... although if the list is so
long as to be fetching a significant fraction of the table, you'll still
have problems.  In that case I'd advise putting the values into a temp
table, ANALYZEing same, and doing "WHERE foo IN (SELECT x FROM tmp_table)".


If 8.2, what about
  ... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
?

It would be interesting to see how that compares performance-wise.

Joe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> If 8.2, what about
>... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
> ?

Well, the OP wasn't using 8.2 --- judging from the selected plan, it had
to be 8.0 or older.  But yeah, a values-list is an interesting
alternative on 8.2.  I think actually you don't need all that much extra
notation; this seems to work:

WHERE foo IN (VALUES ($1),($2),($3),...)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Very large IN-clause is slow, but how to rewrite it?

2007-02-25 Thread Richard Jones
On Sun, Feb 25, 2007 at 01:34:44PM -0500, Tom Lane wrote:
> Joe Conway <[EMAIL PROTECTED]> writes:
> > If 8.2, what about
> >... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
> > ?
> 
> Well, the OP wasn't using 8.2 --- judging from the selected plan, it had
> to be 8.0 or older.  But yeah, a values-list is an interesting
> alternative on 8.2.  I think actually you don't need all that much extra
> notation; this seems to work:
> 
>   WHERE foo IN (VALUES ($1),($2),($3),...)

That's right, it is in fact PG 7.4.

I will try the values suggestion to see if that makes a difference.

Rich.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [SQL] Insert based in a select

2007-02-25 Thread Phillip Smith
(Please reply to the list when replying)

 

The error is correct - you are telling PG to insert one row (the literal
values you've passed to INSERT), but the sub-query is returning multiple
rows which won't fit in a single row.

 

I see 2 options, but someone else I'm sure will have a more elegant way to
do it:

1) insert into base.ingresso (select id from base.cartao order by id)

2) insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao where id not in (SELECT
id from base.ingresso) order by id limit 1), 2, 25, 99)

 

Number 1 will create one row in base.ingresso for each row in base.cartao.
You will then need to use UPDATE to adjust the values in each of those rows
to the other values you want.

 

Number 2 would need to be run over and over again - not a practical option I
imagine.

 

~p

 

-Original Message-
From: Ezequias Rodrigues da Rocha [mailto:[EMAIL PROTECTED] 
Sent: Monday, 26 February 2007 00:43
To: [EMAIL PROTECTED]
Subject: Re: [SQL] Insert based in a select

 

Phillip,

Thank you for the information but the master (id) is only a field of my
secondary table.

My sql statement is like this:

insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao order by id), 2, 25, 99)

And I am getting the following error return message:

ERROR: more than one row returned by a subquery used as an expression 
SQL state: 21000

Could you give me another help with this ?

Ezequias



***Confidentiality and Privilege Notice***

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments



Re: [SQL] Insert based in a select

2007-02-25 Thread Ezequias Rodrigues da Rocha

The second statement works but only 1 row was affected. It do not apply to
me.

The first statement i didn't understand. You are adding only the first field
on my base.ingresso table ?

Regards
Ezequias

2007/2/25, Phillip Smith <[EMAIL PROTECTED]>:


 (Please reply to the list when replying)



The error is correct – you are telling PG to insert one row (the literal
values you've passed to INSERT), but the sub-query is returning multiple
rows which won't fit in a single row.



I see 2 options, but someone else I'm sure will have a more elegant way to
do it:

1) insert into base.ingresso (select id from base.cartao order by id)

2) insert into base.ingresso values (nextval(' base.ingresso_id'), 4,
now(), 12.34, 12.34, 1, 1678, (select id from base.cartao where id not in
(SELECT id from base.ingresso) order by id limit 1), 2, 25, 99)



Number 1 will create one row in base.ingresso for each row in base.cartao.
You will then need to use UPDATE to adjust the values in each of those rows
to the other values you want.



Number 2 would need to be run over and over again – not a practical option
I imagine…



~p



-Original Message-
*From:* Ezequias Rodrigues da Rocha [mailto:[EMAIL PROTECTED]
*Sent:* Monday, 26 February 2007 00:43
*To:* [EMAIL PROTECTED]
*Subject:* Re: [SQL] Insert based in a select



Phillip,

Thank you for the information but the master (id) is only a field of my
secondary table.

My sql statement is like this:

*insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao order by id), 2, 25,
99)*

And I am getting the following error return message:

*ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
*
Could you give me another help with this ?

Ezequias

Confidentiality and Privilege
Notice

The material contained in this message is privileged and confidential to
the addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy or
deliver this message to anyone, and you should destroy it and kindly notify
the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by
Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall
not be liable for direct, indirect or consequential loss arising from
transmission of this message or any attachments





--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Atenciosamente (Sincerely)
   Ezequias Rodrigues da Rocha
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships
http://ezequiasrocha.blogspot.com/


Re: [SQL] Insert based in a select

2007-02-25 Thread Phillip Smith
Apologies – the first statement should have included the column name to
insert to:

INSERT INTO base.ingresso (id_column) (SELECT id FROM base.cartao ORDER BY
id)

 

That will insert one row in ingresso for each row in cartao – only changing
the id column. All the other columns in each row will be populated with the
default values. That is why you will need to do an UPDATE afterwards.
Something like:

UPDATE base.ingresso

SET col1 = ‘value1’, sol2 = ‘value2’

WHERE id IN (SELECT id FROM base.cartao);

 

Hope this makes sense,

~p

 

 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Ezequias Rodrigues da Rocha
Sent: Monday, 26 February 2007 10:58
To: Phillip Smith
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] Insert based in a select

 

The second statement works but only 1 row was affected. It do not apply to
me.

The first statement i didn't understand. You are adding only the first field
on my base.ingresso table ?

Regards
Ezequias

2007/2/25, Phillip Smith <[EMAIL PROTECTED]>:

(Please reply to the list when replying)

 

The error is correct – you are telling PG to insert one row (the literal
values you've passed to INSERT), but the sub-query is returning multiple
rows which won't fit in a single row.

 

I see 2 options, but someone else I'm sure will have a more elegant way to
do it:

1) insert into base.ingresso (select id from base.cartao order by id)

2) insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao where id not in (SELECT
id from base.ingresso) order by id limit 1), 2, 25, 99)

 

Number 1 will create one row in base.ingresso for each row in base.cartao.
You will then need to use UPDATE to adjust the values in each of those rows
to the other values you want.

 

Number 2 would need to be run over and over again – not a practical option I
imagine…

 

~p

 

-Original Message-
From: Ezequias Rodrigues da Rocha [mailto:[EMAIL PROTECTED] 
Sent: Monday, 26 February 2007 00:43
To: [EMAIL PROTECTED]
Subject: Re: [SQL] Insert based in a select

 

Phillip,

Thank you for the information but the master (id) is only a field of my
secondary table.

My sql statement is like this:

insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao order by id), 2, 25, 99)

And I am getting the following error return message:

ERROR: more than one row returned by a subquery used as an expression 
SQL state: 21000

Could you give me another help with this ?

Ezequias

 

***Confidentiality and Privilege Notice*** 

The material contained in this message is privileged and confidential to the
addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy or
deliver this message to anyone, and you should destroy it and kindly notify
the sender by reply email. 

Information in this message that does not relate to the official business of
Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments 




-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Atenciosamente (Sincerely)
Ezequias Rodrigues da Rocha 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships 
http://ezequiasrocha.blogspot.com/ 



***Confidentiality and Privilege Notice***

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments



Re: [SQL] Insert based in a select

2007-02-25 Thread Ezequias Rodrigues da Rocha

Im sorry but I didn't understand what you mean with the UPDATE.

Anyway, it don't apply to my point. I must do an insert in each row of my
ingresso table with his Id, some fields and the id of my cartao table.

You help is very welcomed but my solution doesn't came this time.

My best regards
Ezequias

2007/2/25, Phillip Smith <[EMAIL PROTECTED]>:


 Apologies – the first statement should have included the column name to
insert to:

INSERT INTO base.ingresso (id_column) (SELECT id FROM base.cartao ORDER BY
id)



That will insert one row in ingresso for each row in cartao – only
changing the id column. All the other columns in each row will be populated
with the default values. That is why you will need to do an UPDATE
afterwards. Something like:

UPDATE base.ingresso

SET col1 = 'value1', sol2 = 'value2'

WHERE id IN (SELECT id FROM base.cartao);



Hope this makes sense,

~p





-Original Message-
*From:* [EMAIL PROTECTED] [mailto:
[EMAIL PROTECTED] *On Behalf Of *Ezequias Rodrigues da Rocha
*Sent:* Monday, 26 February 2007 10:58
*To:* Phillip Smith
*Cc:* pgsql-sql@postgresql.org
*Subject:* Re: [SQL] Insert based in a select



The second statement works but only 1 row was affected. It do not apply to
me.

The first statement i didn't understand. You are adding only the first
field on my base.ingresso table ?

Regards
Ezequias

2007/2/25, Phillip Smith <[EMAIL PROTECTED]>:

(Please reply to the list when replying)



The error is correct – you are telling PG to insert one row (the literal
values you've passed to INSERT), but the sub-query is returning multiple
rows which won't fit in a single row.



I see 2 options, but someone else I'm sure will have a more elegant way to
do it:

1) insert into base.ingresso (select id from base.cartao order by id)

2) insert into base.ingresso values (nextval(' base.ingresso_id'), 4,
now(), 12.34, 12.34, 1, 1678, (select id from base.cartao where id not in
(SELECT id from base.ingresso) order by id limit 1), 2, 25, 99)



Number 1 will create one row in base.ingresso for each row in base.cartao.
You will then need to use UPDATE to adjust the values in each of those rows
to the other values you want.



Number 2 would need to be run over and over again – not a practical option
I imagine…



~p



-Original Message-
*From:* Ezequias Rodrigues da Rocha [mailto:[EMAIL PROTECTED]
*Sent:* Monday, 26 February 2007 00:43
*To:* [EMAIL PROTECTED]
*Subject:* Re: [SQL] Insert based in a select



Phillip,

Thank you for the information but the master (id) is only a field of my
secondary table.

My sql statement is like this:

*insert into base.ingresso values (nextval(' base.ingresso_id'), 4, now(),
12.34, 12.34, 1, 1678, (select id from base.cartao order by id), 2, 25,
99)*

And I am getting the following error return message:

*ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
*
Could you give me another help with this ?

Ezequias



Confidentiality and Privilege
Notice

The material contained in this message is privileged and confidential to
the addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy or
deliver this message to anyone, and you should destroy it and kindly notify
the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by
Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall
not be liable for direct, indirect or consequential loss arising from
transmission of this message or any attachments




--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Atenciosamente (Sincerely)
Ezequias Rodrigues da Rocha
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda � melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships
http://ezequiasrocha.blogspot.com/

Confidentiality and Privilege
Notice

The material contained in this message is privileged and confidential to
the addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy or
deliver this message to anyone, and you should destroy it and kindly notify
the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by
Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall
not be liable for direct, indirect or consequential loss arising from
transmission of this message or any attachments





--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Atenciosament