Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-23 Thread Stephen Crowley
Thanks for the explanation. So what sort of changes need to be made to
the client/server protocol to fix this problem?



On Thu, 23 Sep 2004 18:22:15 -0500 (EST), Kris Jurka [EMAIL PROTECTED] wrote:
 
 
 On Tue, 14 Sep 2004, Stephen Crowley wrote:
 
  Problem solved.. I set the fetchSize to a reasonable value instead of
  the default of unlimited  in the PreparedStatement and now the query
  is . After some searching it seeems this is a common problem, would it
  make sense to change the default value to something other than 0 in
  the JDBC driver?
 
 In the JDBC driver, setting the fetch size to a non-zero value means that
 the query will be run using what the frontend/backend protocol calls a
 named statement.  What this means on the backend is that the planner will
 not be able to use the values from the query parameters to generate the
 optimum query plan and must use generic placeholders and create a generic
 plan.  For this reason we have decided not to default to a non-zero
 fetch size.  This is something whose default value could be set by a URL
 parameter if you think that is something that is really required.
 
 Kris Jurka
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-14 Thread Stephen Crowley
Problem solved.. I set the fetchSize to a reasonable value instead of
the default of unlimited  in the PreparedStatement and now the query
is . After some searching it seeems this is a common problem, would it
make sense to change the default value to something other than 0 in
the JDBC driver?

If I get some extra time I'll look into libpq and see what is required
to fix the API. Most thirdparty programs and existing JDBC apps won't
work with the current paradigm when returning large result sets.

Thanks,
Stephen



On Mon, 13 Sep 2004 21:49:14 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
  On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
  Stephen Crowley [EMAIL PROTECTED] writes:
  Does postgres cache the entire result set before it begins returning
  data to the client?
 
  The backend doesn't, but libpq does, and I think JDBC does too.
 
  That is incredible. Why would libpq do such a thing?
 
 Because the API it presents doesn't allow for the possibility of query
 failure after having given you back a PGresult: either you have the
 whole result available with no further worries, or you don't.
 If you think it's incredible, let's see you design an equally
 easy-to-use API that doesn't make this assumption.
 
 (Now having said that, I would have no objection to someone extending
 libpq to offer an alternative streaming API for query results.  It
 hasn't got to the top of anyone's to-do list though ... and I'm
 unconvinced that psql could use it if it did exist.)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
Does postgres cache the entire result set before it begins returning
data to the client?

I have a table with ~8 million rows and I am executing a query which
should return about ~800,000 rows. The problem is that as soon as I
execute the query it absolutely kills my machine and begins swapping
for 5 or 6 minutes before it begins returning results. Is postgres
trying to load the whole query into memory before returning anything?
Also, why would it choose not to use the index? It is properly
estimating the # of rows returned. If I set enable_seqscan to off it
is just as slow.

Running postgres 8.0 beta2 dev2

explain select * from island_history where date='2004-09-07' and stock='QQQ';
QUERY PLAN
---
 Seq Scan on island_history  (cost=0.00..266711.23 rows=896150 width=83)
   Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'QQQ'::text))
(2 rows)

Any help would be appreciated

--Stephen

 Table public.island_history
  Column  |  Type  | Modifiers
--++---
 date | date   | not null
 stock| character varying(6)   |
 time | time without time zone | not null
 reference_number | numeric(9,0)   | not null
 message_type | character(1)   | not null
 buy_sell_ind | character(1)   |
 shares   | numeric(6,0)   |
 remaining_shares | numeric(6,0)   |
 price| numeric(10,4)  |
 display  | character(1)   |
 match_number | numeric(9,0)   | not null
Indexes:
island_history_pkey PRIMARY KEY, btree (date, reference_number,
message_type, time, match_number)
island_history_date_stock_time btree (date, stock, time)
island_history_oid btree (oid)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Tom Lane
Stephen Crowley [EMAIL PROTECTED] writes:
 Does postgres cache the entire result set before it begins returning
 data to the client?

The backend doesn't, but libpq does, and I think JDBC does too.

I'd recommend using a cursor so you can FETCH a reasonable number of
rows at a time.

 Also, why would it choose not to use the index?

Selecting 1/10th of a table is almost always a poor candidate for an
index scan.  You've got about 100 rows per page (assuming the planner's
width estimate is credible) and so on average every page of the table
has about ten rows that need to be picked up and returned.  You might as
well just seqscan and be sure you don't read any page more than once.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
  Does postgres cache the entire result set before it begins returning
  data to the client?
 
 The backend doesn't, but libpq does, and I think JDBC does too.
 
 I'd recommend using a cursor so you can FETCH a reasonable number of
 rows at a time.

That is incredible. Why would libpq do such a thing? JDBC as well? I
know oracle doesn't do anything like that, not sure about mysql. Is
there any way to turn it off? In this case I was just using psql but
will be using JDBC for the app.  About cursors, I thought a jdbc
ResultSet WAS a cursor, am I mistaken?

Thanks,
Stephen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Tom Lane
Stephen Crowley [EMAIL PROTECTED] writes:
 On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
 Does postgres cache the entire result set before it begins returning
 data to the client?
 
 The backend doesn't, but libpq does, and I think JDBC does too.

 That is incredible. Why would libpq do such a thing?

Because the API it presents doesn't allow for the possibility of query
failure after having given you back a PGresult: either you have the
whole result available with no further worries, or you don't.
If you think it's incredible, let's see you design an equally
easy-to-use API that doesn't make this assumption.

(Now having said that, I would have no objection to someone extending
libpq to offer an alternative streaming API for query results.  It
hasn't got to the top of anyone's to-do list though ... and I'm
unconvinced that psql could use it if it did exist.)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly