[PERFORM] Function with table%ROWTYPE globbing

2005-11-03 Thread Sven Willenberger
Postgresql 8.0.4 using plpgsql

The basic function is set up as:
CREATE FUNCTION add_data(t_row mytable) RETURNS VOID AS $func$
DECLARE
newtable text;
thesql text;
BEGIN
INSERT INTO newtable thename from mytable where lookup.id =
t_row.id;
thesql := 'INSERT INTO ' || newtable || VALUES (' || t_row.* ')';
EXECUTE thesql;
RETURN;
END;
$func$ LANGUAGE plpgsql VOLATILE;

SELECT add_data(t.*) FROM mytable t where 
ERROR:  column "*" not found in data type mytable

Now I have tried to drop the * but then there is no concatenation
function to join text to a table%ROWTYPE. So my question is how can I
make this dynamic insert statement without listing out every
t_row.colname? Or, alternatively, is there a better way to parse out
each row of a table into subtables based on a column value?

Sven



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] effective cache size on FreeBSD (WAS: Performance on SUSE w/

2005-10-11 Thread Sven Willenberger
On Tue, 2005-10-11 at 16:54 +0200, Claus Guttesen wrote:
> > > I have a postgresql 7.4.8-server with 4 GB ram.
> > > #effective_cache_size = 1000# typically 8KB each
> > >
> > > This is computed by sysctl -n vfs.hibufspace / 8192 (on FreeBSD). So I
> > > changed it to:
> > >
> > > effective_cache_size = 27462# typically 8KB each
> >
> > Apparently this formula is no longer relevant on the FreeBSD systems as
> > it can cache up to almost all the available RAM. With 4GB of RAM, one
> > could specify most of the RAM as being available for caching, assuming
> > that nothing but PostgreSQL runs on the server -- certainly 1/2 the RAM
> > would be a reasonable value to tell the planner.
> >
> > (This was verified by using dd:
> > dd if=/dev/zero of=/usr/local/pgsql/iotest bs=128k count=16384 to create
> > a 2G file then
> > dd if=/usr/local/pgsql/iotest of=/dev/null
> >
> > If you run systat -vmstat 2 you will see 0% diskaccess during the read
> > of the 2G file indicating that it has, in fact, been cached)
> 
> Thank you for your reply. Does this apply to FreeBSD 5.4 or 6.0 on
> amd64 (or both)?
> 

Not sure about 6.0 (but I don't know why it would change) but definitely
on 5.4 amd64 (and I would imagine i386 as well).

Sven


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Sven Willenberger
On Tue, 2005-10-11 at 09:41 +0200, Claus Guttesen wrote:
> I have a postgresql 7.4.8-server with 4 GB ram.


> 
> > #effective_cache_size = 1000# typically 8KB each
> 
> This is computed by sysctl -n vfs.hibufspace / 8192 (on FreeBSD). So I
> changed it to:
> 
> effective_cache_size = 27462# typically 8KB each

Apparently this formula is no longer relevant on the FreeBSD systems as
it can cache up to almost all the available RAM. With 4GB of RAM, one
could specify most of the RAM as being available for caching, assuming
that nothing but PostgreSQL runs on the server -- certainly 1/2 the RAM
would be a reasonable value to tell the planner.

(This was verified by using dd: 
dd if=/dev/zero of=/usr/local/pgsql/iotest bs=128k count=16384 to create
a 2G file then
dd if=/usr/local/pgsql/iotest of=/dev/null

If you run systat -vmstat 2 you will see 0% diskaccess during the read
of the 2G file indicating that it has, in fact, been cached)


Sven


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Sven Willenberger
On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote:
> I am working on a process that will be inserting tens of million rows  
> and need this to be as quick as possible.
> 
> The catch is that for each row I could potentially insert, I need to  
> look and see if the relationship is already there  to prevent  
> multiple entries.  Currently I am doing a SELECT before doing the  
> INSERT, but I recognize the speed penalty in doing to operations.  I  
> wonder if there is some way I can say "insert this record, only if it  
> doesn't exist already".  To see if it exists, I would need to compare  
> 3 fields instead of just enforcing a primary key.
> 
> Even if this could be a small increase per record, even a few percent  
> faster compounded over the whole load could be a significant reduction.
> 
> Thanks for any ideas you might have.
> 

Perhaps a trigger:

CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$
BEGIN
PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c;
IF FOUND THEN 
RETURN NULL;
END IF;
RETURN NEW;
END;
$func$ LANGUAGE plpgsql STABLE;

CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW
EXECUTE PROCEDURE verify_unique();

Triggers are fired on COPY commands and if table1 is able to be cached
and you have an index on table1(a,b,c) the results should be fairly
decent. I would be interested in seeing the difference in timing between
this approach and the temp table approach.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Looking for tips

2005-07-19 Thread Sven Willenberger
On Tue, 2005-07-19 at 17:04 -0400, Oliver Crosby wrote:
> > since triggers work with COPY, you could probably write a trigger that
> > looks for this condition and does the ID processsing you need; you could
> > thereby enjoy the enormous speed gain resulting from COPY and maintain
> > your data continuity.
> 
> So... (bear with me here.. trying to make sense of this)..
> With triggers there's a way I can do the parsing I need to on a log
> file and react to completed events in non-sequential order (you can
> ignore that part.. it's just how we piece together different related
> events) and then have perl/DBD::Pg invoke a copy command (which, from
> what I can tell, has to operate on a file...) and the copy command can
> feed the ID I need back to perl so I can work with it...
> If that doesn't hurt my brain, then I'm at least kinda confused...
> Anyway. Heading home now. I'll think about this more tonight/tomorrow.
> 

Well without knowing the specifics of what you are actually trying to
accomplish I cannot say yes or no to your question. I am not sure from
where this data is coming that you are inserting into the db. However,
if the scenario is this: a) attempt to insert a row b) if row exists
already, grab the ID and do other db selects/inserts/deletes based on
that ID, then there is no need to feed this information back to the
perlscript. Is your perlscript parsing a file and then using the parsed
information to insert rows? If so, how is the ID that is returned used?
Can you have the trigger use the ID that may be returned to perform
whatever it is that your perlscript is trying to accomplish with that
ID?

It's all kind of vague so my answers may or may not help, but based on
the [lack of] specifics you have provided, I fear that is the best
suggestion that I can offer at this point.

Sven


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Looking for tips

2005-07-19 Thread Sven Willenberger
On Tue, 2005-07-19 at 16:28 -0400, Oliver Crosby wrote:
> > If it is possible try:
> > 1) wrapping many inserts into one transaction
> > (BEGIN;INSERT;INSERT;...INSERT;COMMIT;).  As PostgreSQL will need to
> > handle less transactions per second (each your insert is a transaction), it
> > may work faster.
> 
> Aye, that's what I have it doing right now. The transactions do save a
> HUGE chunk of time. (Cuts it down by about 40%).
> 
> > 2) If you can do 1, you could go further and use a COPY command which is
> > the fastest way to bulk-load a database.
> 
> I don't think I can use COPY in my case because I need to do
> processing on a per-line basis, and I need to check if the item I want
> to insert is already there, and if it is, I need to get it's ID so I
> can use that for further processing.
> 

since triggers work with COPY, you could probably write a trigger that
looks for this condition and does the ID processsing you need; you could
thereby enjoy the enormous speed gain resulting from COPY and maintain
your data continuity.

Sven


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Adaptec/LSI/?? RAID

2005-06-01 Thread Sven Willenberger



Stacy White presumably uttered the following on 06/01/05 23:42:

We're in the process of buying another Opteron server to run Postgres, and
based on the suggestions in this list I've asked our IT director to get an
LSI MegaRaid controller rather than one of the Adaptecs.

But when we tried to place our order, our vendor (Penguin Computing) advised
us:

"we find LSI does not work well with 4GB of RAM. Our engineering find that
LSI card could cause system crashes. One of our customer ... has found that
Adaptec cards works well on PostGres SQL -- they're using it as a preforce
server with xfs and post-gress."

Any comments?  Suggestions for other RAID controllers?


---(end of broadcast)---


We use the LSI MegaRaid 320-2x with the battery-backed cache on a dual 
opteron system that uses 8G of RAM. OS is FreeBSD amd64 (5.4) and runs 
without hesitation. Database currently over 100GB and it performs 
admirably. So chalk one anecdotal item towards the LSI column. To be 
fair I have not tried an Adaptec card with this setup so I cannot 
comment positively or negatively on that card. As a side note, we did 
have issues with this setup with Linux (2.6 kernel - 64bit) and XFS file 
system (we generally use FreeBSD but I wanted to try other 64bit OSes 
before committing). Whether the linux issues were due to the LSI, 
memory, Tyan mobo, or something else was never determined -- FreeBSD ran 
it and did so without flinching so our choice was easy.


HTH

Sven

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-22 Thread Sven Willenberger
On Tue, 2005-03-22 at 14:44 -0600, Brandon Metcalf wrote:
> We've recently moved our pgsql installation and DBs to a Solaris 8
> machine with striped and mirrored ufs filesystem that houses the DB
> data.  We are now seeing terrible performance and the bottleneck is no
> doubt disk I/O.
> 
> We've tried modifying a tunables related to ufs, but it doesn't seem
> to be helping.
> 
> Is there anything we should be looking at that is specifically related
> to ufs filesystems on Solaris 8 or possibly something in general that
> would improve performance?
> 
> Thanks.
> 

What are you using to create your raid? You say it is "no doubt disk
I/O" - does iostat confirm this? A lot of performance issues are related
to the size of the stripe you chose for the striped portion of the
array, the actual array configuration, etc. I am assuming you have
looked at system variables such as autoup and the likes? What tweaks
have you done?

Also, are your pg_xlog and data directories separated onto separate
volumes? Doing so will help immensely. What are you using to measure
performance?

Sven


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Inheritence versus delete from

2005-03-01 Thread Sven Willenberger
On Tue, 2005-03-01 at 09:48 -0600, John Arbash Meinel wrote:
> Sven Willenberger wrote:
> 
> >Trying to determine the best overall approach for the following
> >scenario:
> >
> >Each month our primary table accumulates some 30 million rows (which
> >could very well hit 60+ million rows per month by year's end). Basically
> >there will end up being a lot of historical data with little value
> >beyond archival.
> >
> >The question arises then as the best approach of which I have enumerated
> >three:
> >
> 
> I just thought of another possibility. You could create each table
> month-by-month, and then use a view to combine them, and possibly a rule
> to keep things clean.
> 
> So you would do something like:
> 
> I will assume you already have the data in one big table to show the
> easiest way to create the small tables.
> 
> create table tblname-2005-01 as select * from orig_tbl where day >=
> '2005-01-01' and day < '2005-02-01';
> create table tblname-2005-02 as select * from orig_tbl where day >=
> '2005-02-01' and day < '2005-03-01';
> create table tblname-2005-03 as select * from orig_tbl where day >=
> '2005-03-01' and day < '2005-04-01';
> -- create appropriate indicies, rules, constraints on these tables
> 
> Then you create a view which includes all of these tables.
> 
> create or replace view tblname as
> select * from tblname-2005-01
> union all select * from tblname-2005-02
> union all select * from tblname-2005-03
> ;
> 
> Then create insert and update rules which fixe which table gets the new
> data.
> 
> create rule up_tblname as on update to tblname do instead
> update tblname-2005-03 set
>col1 = NEW.col1,
>col2 = NEW.col2,
>...
> where id = NEW.id;
> -- This assumes that you have a unique id on your tables. This is just
> whatever your
> -- primary key is, so it should be a decent assumption.
> 
> create rule ins_tblname as on insert to tblname do instead
> insert into tblname-2005-03 (col1, col2, ...)
> values (new.col1, new.col2, ...);
> 
> Now the downside of this method, is that every month you need to create
> a new table, and then update the views and the rules. The update rules
> are pretty straightforward, though.
> 
> The nice thing is that it keeps your data partitioned, and you don't
> ever have a large select/delete step. You probably will want a small one
> each month to keep the data exactly aligned by month. You don't really
> have to have exact alignments, but as humans, we tend to like that stuff. :)
> 
> Probably this is more overhead than you would like to do. Especially if
> you know that you can get away with method 2 (keep 1 big table, and just
> remove old rows out of it every month.)
> 
> But this method means that all of your data stays live, but queries with
> appropriate restrictions should stay fast. You also have the ability
> (with v8.0) to move the individual tables onto separate disks.
> 
> One more time, though, if you can get away with removing old data and
> just archiving it, do so. But if you want to keep the data live, there
> are a couple of alternatives.
> 

Actually that was the thought behind my using inheritance; when querying
the , it basically does a union all; also, I think it would be
quicker to insert directly into the child table (simply by modifying my
query once a month) rather than the overhead sustained by the rule.

Since the children tables are individual tables, all the benefits you
cite above still hold. 

Thanks for the input on this ... will have to try a couple things to see
which is most manageable.\

Sven


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Inheritence versus delete from

2005-02-28 Thread Sven Willenberger
Trying to determine the best overall approach for the following
scenario:

Each month our primary table accumulates some 30 million rows (which
could very well hit 60+ million rows per month by year's end). Basically
there will end up being a lot of historical data with little value
beyond archival.

The question arises then as the best approach of which I have enumerated
three:

1) Just allow the records to accumulate and maintain constant vacuuming,
etc allowing for the fact that most queries will only be from a recent
subset of data and should be mostly cached.

2) Each month:
SELECT * INTO 3monthsago_dynamically_named_table FROM bigtable WHERE
targetdate < $3monthsago;
DELETE FROM bigtable where targetdate < $3monthsago;
VACUUM ANALYZE bigtable;
pg_dump 3monthsago_dynamically_named_table for archiving;

3) Each month:
CREATE newmonth_dynamically_named_table (like mastertable) INHERITS
(mastertable);
modify the copy.sql script to copy newmonth_dynamically_named_table;
pg_dump 3monthsago_dynamically_named_table for archiving;
drop table 3monthsago_dynamically_named_table;

Any takes on which approach makes most sense from a performance and/or
maintenance point of view and are there other options I may have missed?

Sven Willenberger


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Using LIMIT changes index used by planner

2004-12-14 Thread Sven Willenberger
On Mon, 2004-12-13 at 17:43 -0500, Tom Lane wrote:
> Sven Willenberger <[EMAIL PROTECTED]> writes:
> > explain analyze select storelocation,order_number from custacct where 
> > referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
> > 12:00:00' order by custacctid limit 10;
>  
> >   QUERY PLAN 
> 
> > ---
> >   Limit  (cost=0.00..43065.76 rows=10 width=43) (actual 
> > time=1306957.216..1307072.111 rows=10 loops=1)
> > ->  Index Scan using custacct2_pkey on custacct 
> > (cost=0.00..92083209.38 rows=21382 width=43) (actual 
> > time=1306957.205..1307072.017 rows=10 loops=1)
> >   Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 
> > 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
> > 12:00:00'::timestamp without time zone))
> >   Total runtime: 1307072.231 ms
> > (4 rows)
> 
> I think this is the well-known issue of lack of cross-column correlation
> statistics.  The planner is well aware that this indexscan will be
> horridly expensive if run to completion --- 
> 
> There isn't any near-term fix in the wind for this, since storing
> cross-column statistics is an expensive proposition that we haven't
> decided how to handle.  Your workaround with separating the ORDER BY
> from the LIMIT is a good one.
> 

You are correct in that there is a high degree of correlation between
the custacctid (which is a serial key) and the orderdate as the orders
generally get entered in the order that they arrive. I will go with the
workaround subselect query plan then.

On a related note, is there a way (other than set enable_seqscan=off) to
give a hint to the planner that it is cheaper to use and index scan
versus seq scan? Using the "workaround" query on any time period greater
than 12 hours results in the planner using a seq scan. Disabling the seq
scan and running the query on a full day period for example shows:

explain analyze select foo.storelocaion, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365  and
ordertdate between '2004-12-09' and '2004-12-10' order by custacctid) as
foo  limit 10 offset 100;

QUERY PLAN
---
 Limit  (cost=2661326.22..2661326.35 rows=10 width=100) (actual
time=28446.605..28446.796 rows=10 loops=1)
   ->  Subquery Scan foo  (cost=2661324.97..2661866.19 rows=43297
width=100) (actual time=28444.916..28446.298 rows=110 loops=1)
 ->  Sort  (cost=2661324.97..2661433.22 rows=43297 width=41)
(actual time=28444.895..28445.334 rows=110 loops=1)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct
(cost=0.00..2657990.68 rows=43297 width=41) (actual
time=4.432..28145.212 rows=44333 loops=1)
 Index Cond: ((orderdate >= '2004-12-09
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-10
00:00:00'::timestamp without time zone))
 Filter: (referrer = 1365)
 Total runtime: 28456.893 ms
(8 rows)


If I interpret the above correctly, the planner guestimates a cost of
2661326 but the actual cost is much less (assuming time is equivalent to
cost). Would the set statistics command be of any benefit here in
"training" the planner?

Sven




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Using LIMIT changes index used by planner

2004-12-13 Thread Sven Willenberger

Andrew McMillan wrote:
On Mon, 2004-12-13 at 01:13 -0500, Sven Willenberger wrote:
I have a question regarding a serious performance hit taken when using a 
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB 
of memory. The table in question contains some 25 million rows with a 
bigserial primary key, orderdate index and a referrer index. The 2 
select statements are as follow:

It's an interesting question, but to be able to get answers from this
list you will need to provide "EXPLAIN ANALYZE ..." rather than just
"EXPLAIN ...".
A) Query without limit clause:
explain analyze select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid;

   QUERY PLAN 

---
 Sort  (cost=1226485.32..1226538.78 rows=21382 width=43) (actual 
time=30340.322..30426.274 rows=21432 loops=1)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..1224947.52 rows=21382 width=43) (actual 
time=159.218..30196.686 rows=21432 loops=1)
 Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp 
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp 
without time zone))
 Filter: (referrer = 1365)
 Total runtime: 30529.151 ms
(6 rows)


A2) Same query run again, to see effect of caching:
explain analyze select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid;

   QUERY PLAN 

---
 Sort  (cost=1226485.32..1226538.78 rows=21382 width=43) (actual 
time=1402.410..1488.395 rows=21432 loops=1)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..1224947.52 rows=21382 width=43) (actual time=0.736..1259.964 
rows=21432 loops=1)
 Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp 
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp 
without time zone))
 Filter: (referrer = 1365)
 Total runtime: 1590.675 ms
(6 rows)

***
B) Query run with LIMIT
explain analyze select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid limit 10;

 QUERY PLAN 

---
 Limit  (cost=0.00..43065.76 rows=10 width=43) (actual 
time=1306957.216..1307072.111 rows=10 loops=1)
   ->  Index Scan using custacct2_pkey on custacct 
(cost=0.00..92083209.38 rows=21382 width=43) (actual 
time=1306957.205..1307072.017 rows=10 loops=1)
 Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
 Total runtime: 1307072.231 ms
(4 rows)


C) Query using the subselect variation
explain analyze select foo.storelocation, foo.order_number from (select 
storelocation,order_number from custacct where referrer = 1365  and 
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid) as foo  limit 10;

 QUERY PLAN 

---
 Limit  (cost=1226485.32..1226485.45 rows=10 width=100) (actual 
time=1413.829..1414.024 rows=10 loops=1)
   ->  Subquery Scan foo  (cost=1226485.32..1226752.60 rows=21382 
width=100) (actual time=1413.818..1413.933 rows=10 loops=1)
 ->  Sort  (cost=1226485.32..1226538.78 rows=21382 width=43) 
(actual time=1413.798..1413.834 rows=10 loops=1)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..1224947.52 rows=21382 width=43) (actual time=0.740..1272.380 
rows=21432 loops=1)
 Index Cond: ((orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
 Filter: (referrer = 1365)
 Total runtime: 1418.964 ms
(8 rows)

Thanks,
Sven
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Using LIMIT changes index used by planner

2004-12-12 Thread Sven Willenberger
I have a question regarding a serious performance hit taken when using a 
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB 
of memory. The table in question contains some 25 million rows with a 
bigserial primary key, orderdate index and a referrer index. The 2 
select statements are as follow:

A) select storelocation,order_number from custacct where referrer = 1365 
 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid;

B) select storelocation,order_number from custacct where referrer = 1365 
 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid limit 10;

So the only difference is the use of the Limit, which, in theory, should 
be quicker after custacctid is ordered.

Now the analyze results:
A) explain select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid;

   QUERY PLAN 

---
 Sort  (cost=904420.55..904468.11 rows=19025 width=44)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..903068.29 rows=19025 width=44)
 Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp 
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp 
without time zone))
 Filter: (referrer = 1365)
(5 rows)


B) explain select storelocation,order_number from custacct where 
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07 
12:00:00' order by custacctid limit 10;

 QUERY PLAN 

---
 Limit  (cost=0.00..33796.50 rows=10 width=44)
   ->  Index Scan using custacct2_pkey on custacct 
(cost=0.00..64297840.86 rows=19025 width=44)
 Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
(3 rows)

***
Notice the huge cost difference in the two plans: 904468 in the one 
without LIMIT versus 64297840.86 for the index scan on custacct index. 
Why would the planner switch from using the orderdate index to the 
custacct index (which is a BIGSERIAL, btw)?

I can change that behavior (and speed up the resultant query) by using 
the following subquery:

explain select foo.storelocation, foo.order_number from (select 
storelocation,order_number from custacct where referrer = 1365  and 
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by 
custacctid) as foo  limit 10;

 QUERY PLAN 

---
 Limit  (cost=904420.55..904420.67 rows=10 width=100)
   ->  Subquery Scan foo  (cost=904420.55..904658.36 rows=19025 width=100)
 ->  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
   Sort Key: custacctid
   ->  Index Scan using orderdate_idx on custacct 
(cost=0.00..903068.29 rows=19025 width=44)
 Index Cond: ((orderdate >= '2004-12-07 
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 
12:00:00'::timestamp without time zone))
 Filter: (referrer = 1365)
(7 rows)

As a side note, when running query A, the query takes 1772.523 ms, when 
running the subselect version to get the limit, it takes 1415.615 ms. 
Running option B (with the other index being scanned) takes several 
minutes (close to 10 minutes!). What am I missing about how the planner 
views the LIMIT statement?

Sven
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Overhead of dynamic query in trigger

2004-12-03 Thread Sven Willenberger
(Originally asked in [General], realized that it would probably be 
better asked in [Perform]:

I am curious as to how much overhead building a dynamic query in a
trigger adds to the process. The example:
Have a list of subcontractors, each of which gets unique pricing. There 
is a total of roughly 100,000 items available and some 100 
subcontractors. The 2 design choices would be 100 tables (one for each 
sub) at 100,000 rows or 1 table with 10,000,000 rows.

Choice 1:
table has item number (indexed) and price
Choice 2:
table has subcontractor id, item number, and price; index on
(subcontractorid, item number).
Table of orders would have a trigger to insert line item cost:
---
Trigger Choice 1:
Select into thetable lookupprice from subcontractors where
subcontractorid = NEW.subcontractorid;
thequery := ''Select price from '' || thetable.lookupprice || '' where
itemnumber = '' || NEW.itemnumber;
FOR therow IN EXECUTE thequery LOOP
NEW.itemcost := therow.price;
END LOOP;
RETURN NEW;
---
Trigger Choice 2:
Select into thetable lookupprice from subcontractors where
subcontractorid = NEW.subcontractorid;
Select into therow price from mastertable where subcontractorid =
NEW.subcontractorid and itemnumber = NEW.itemnumber;
NEW.itemcost := therow.price;
RETURN NEW;
---
Doing a select from the command line, the mastertable method (with id
and partno index) is faster than looking up a single item in a named
table (with partno index). At what point would Trigger Choice 2 fall
behind performance with Trigger Choice 1 (if ever)? Is there a way to
analyze the performance of dynamic queries? If I had only 10
subcontractors or if I had 1000 subcontractors, at what point is the
overhead of building/executing a dynamic query negated by the amount of
time to look up both the subid and part number in one massive table?
Thanks,
Sven
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: Off-list Re: [PERFORM] Alternatives to Dell?

2004-12-01 Thread Sven Willenberger

Josh Berkus wrote:
Jeff,

I'm curious about the problem's you're seeing with Dell servers since
we're about to buy some 750s, 2850s and 1850s.

The problems I've been dealing with have been on the *650s.   They're the ones 
you name.


FYI ... the 750s, 1850s and 2850s use Intel chipsets (E7520 on 1850s
and 2850s, 7210 on 750s), Intel NICs, and come only with LSI Logic
RAID controllers.  It looks like Dell has dropped the
Broadcom/ServerWorks and Adaptec junk.

I don't know if Vivek is on this list; I think he just had a critical failure 
with one of the new Dells with LSI.

On this note about "Adaptec junk", I have a question regarding hardware 
as well. We tend to build a lot of servers in house (Supermicro based 
with the Zero-channel raid). Does anyone have any anecdotal or empirical 
data on using a ZCR card versus a full-blown RAID controller (adaptec or 
other)?? I am trying to build a medium-duty database server with 8G RAM, 
4x144GB U320 Scsi RAID 10, FreeBSD (5.3-stable or 4-stable) and was 
wondering about performance differences between ZCR and Adaptec versus 
other manufacturers' Full-RAID cards. (PCI-E)

Sven
---(end of broadcast)---
TIP 8: explain analyze is your friend