Re: [PERFORM] Very important choice

2005-02-01 Thread Richard Huxton
Lago, Bruno Almeida do wrote:
Hello my friends,
I'd like to know (based on your experience and technical details) which OS
is recommended for running PostgreSQL keeping in mind 3 indicators:
1 - Performance (SO, Network and IO)
2 - SO Stability
3 - File System Integrity
The short answer is almost certainly whichever OS you are most familiar 
with. If you have a problem, you don't want to be learning new details 
about your OS while fixing it. That rules out FreeBSD for now.

What hardware you want to use will affect performance and choice of OS. 
You'll need to decide what hardware you're looking to use.

As far as file-systems are concerned, ext3 seems to be the slowest, and 
the general feeling seems to be that XFS is perhaps the fastest. In 
terms of reliability, avoid cutting-edge releases of any file-system - 
let others test them for you. One thing to consider is how long it takes 
to recover from a crash - you can run PostgreSQL on ext2, but checking a 
large disk can take hours after a crash. That's the real benefit of 
journalling for PG - speed of recovery.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Jim C. Nasby
On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote:
 You might look at Opteron's, which theoretically have a higher data
 bandwidth. If you're doing anything data intensive, like a sort in
 memory, this could make a difference.
 
 Would Opteron systems need 64-bit postgresql (and os, gcc, ...)
 build to have that advantage?
 
Well, that would give you the most benefit, but the memory bandwidth is
still greater than on a Xeon. There's really no issue with 64 bit if
you're using open source software; it all compiles for 64 bits and
you're good to go. http://stats.distributed.net runs on a dual opteron
box running FreeBSD and I've had no issues.

 RAID10 will be faster than RAID1.
 
 Sorry Jim, by RAID10 you mean several raid1 arrays mounted on
 different linux partitions? Or several raid1 arrays that
 build up a raid0 array? In the latter case, who decides which
 data goes in which raid1 array? Raid Adapter?

You should take a look around online for a description of raid types.

There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored
drives (a RAID 0 built out of RAID 1s), the other is a mirror of two
RAID 0s. The former is much better; if you're lucky you can lose half
your drives without any data loss (if each dead drive is part of a
different mirror). Recovery is also faster.

You'll almost certainly be much happier with hardware raid instead of
software raid. stats.distributed.net runs a 3ware controller and SATA
drives.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Performance of count(*) on large tables vs SQL Server

2005-02-01 Thread Andrew Mayo
Doing some rather crude comparative performance tests
between PG 8.0.1 on Windows XP and SQL Server 2000, PG
whips SQL Server's ass on

insert into junk (select * from junk)

on a one column table defined as int.
If we start with a 1 row table and repeatedly execute
this command, PG can take the table from 500K rows to
1M rows in 20 seconds; SQL Server is at least twice as
slow.

BUT... 

SQL Server can do

select count(*) on junk

in almost no time at all, probably because this query
can be optimised to go back and use catalogue
statistics.

PG, on the other hand, appears to do a full table scan
to answer this question, taking nearly 4 seconds to
process the query.

Doing an ANALYZE on the table and also VACUUM did not
seem to affect this.

Can PG find a table's row count more efficiently?.
This is not an unusual practice in commercial
applications which assume that count(*) with no WHERE
clause will be a cheap query  - and use it to test if
a table is empty, for instance. (because for
Oracle/Sybase/SQL Server, count(*) is cheap).

(sure, I appreciate there are other ways of doing
this, but I am curious about the way PG works here).



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Alex Turner
To be honest I've used compaq, dell and LSI SCSI RAID controllers and
got pretty pathetic benchmarks from all of them.  The best system I
have is the one I just built:

2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives:
2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10.  2x3ware (now AMCC)
Escalade 9500S-8MI.

This system with fsync on has managed 2500 insert transactions/sec
(granted they are simple transactions, but still).

RAID 10 is a stripe of mirrors.  RAID 10 give you the best read and
write performance combined.  RAID 5 gives very bad write perfomance,
but good read performance.  With RAID 5 you can only loose a single
drive and rebuild times are slow.  RAID 10 can loose up to have the
array depending on which drives without loosing data.

I would be interested in starting a site listing RAID benchmarks under
linux.  If anyone is interested let me know.  I would be interested in
at least some bonnie++ benchmarks, and perhaps other if people would
like.

Alex Turner
NetEconomist


On Tue, 1 Feb 2005 05:27:27 -0600, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote:
  You might look at Opteron's, which theoretically have a higher data
  bandwidth. If you're doing anything data intensive, like a sort in
  memory, this could make a difference.
 
  Would Opteron systems need 64-bit postgresql (and os, gcc, ...)
  build to have that advantage?
  
 Well, that would give you the most benefit, but the memory bandwidth is
 still greater than on a Xeon. There's really no issue with 64 bit if
 you're using open source software; it all compiles for 64 bits and
 you're good to go. http://stats.distributed.net runs on a dual opteron
 box running FreeBSD and I've had no issues.
 
  RAID10 will be faster than RAID1.
 
  Sorry Jim, by RAID10 you mean several raid1 arrays mounted on
  different linux partitions? Or several raid1 arrays that
  build up a raid0 array? In the latter case, who decides which
  data goes in which raid1 array? Raid Adapter?
 
 You should take a look around online for a description of raid types.
 
 There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored
 drives (a RAID 0 built out of RAID 1s), the other is a mirror of two
 RAID 0s. The former is much better; if you're lucky you can lose half
 your drives without any data loss (if each dead drive is part of a
 different mirror). Recovery is also faster.
 
 You'll almost certainly be much happier with hardware raid instead of
 software raid. stats.distributed.net runs a 3ware controller and SATA
 drives.
 --
 Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
 Give your computer some brain candy! www.distributed.net Team #1828
 
 Windows: Where do you want to go today?
 Linux: Where do you want to go tomorrow?
 FreeBSD: Are you guys coming, or what?
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Merlin Moncure
 Hi all,
 1) What kind of performance gain can I expect switching from
 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing,
 but I'm not very impressed by 8.0 speed, may be I'm doing
 testing on a low end server...

8.0 gives you savepoints.  While this may not seem like a big deal at
first, the ability to handle exceptions inside pl/pgsql functions gives
you much more flexibility to move code into the server.  Also, recent
versions of pl/pgsql give you more flexibility with cursors, incuding
returning them outside of the function.
Corollary: use pl/pgsql.  It can be 10 times or more faster than query
by query editing.

You also have the parse/bind interface.  This may not be so easily to
implement in your app, but if you are machine gunning your server with
queries, use parameterized prepared queries and reap 50% + performance,
meaning lower load and quicker transaction turnaround time.


Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Why the difference in query plan and performance pg 7.4.6?

2005-02-01 Thread Joost Kraaijeveld
Hi all,

I have a freshly vacuumed table with 1104379 records with a index on zipcode. 
Can anyone explain why the queries go as they go, and why the performance 
differs so much (1 second versus 64 seconds, or stated differently,  1 
records per second versus 1562 records per second) and why the query plan of 
query 2 ignores the index?

For completeness sake I also did a select ordernumber without any ordering. 
That only took 98 second for 1104379 record (11222 record per second, 
compariable with the first query as I would have expected). 

Query 1:
select a.ordernumer from orders a order by a.zipcode limit 1
Explain: 
QUERY PLAN
Limit  (cost=0.00..39019.79 rows=1 width=14)
  -  Index Scan using orders_postcode on orders a  (cost=0.00..4309264.07 
rows=1104379 width=14)
Running time: 1 second

Query 2:
select a.ordernumer from orders a order by a.zipcode limit 10
Explain:
QUERY PLAN
Limit  (cost=207589.75..207839.75 rows=10 width=14)
  -  Sort  (cost=207589.75..210350.70 rows=1104379 width=14)
Sort Key: postcode
-  Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=14)
Running time: 64 seconds

Query 3:
select a.ordernumer from orders a
QUERY PLAN
Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=4)
Running time: 98 seconds

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Why the difference in query plan and performance pg

2005-02-01 Thread John Arbash Meinel
Joost Kraaijeveld wrote:
Hi all,
I have a freshly vacuumed table with 1104379 records with a index on zipcode. 
Can anyone explain why the queries go as they go, and why the performance 
differs so much (1 second versus 64 seconds, or stated differently,  1 
records per second versus 1562 records per second) and why the query plan of 
query 2 ignores the index?


Indexes are generally only faster if you are grabbing 10% of the table.
Otherwise you have the overhead of loading the index into memory, and
then paging through it looking for the entries.
With 100,000 entries a sequential scan is actually likely to be faster
than an indexed one.
If you try:
select a.ordernumer from orders a order by a.zipcode
how long does it take?
You can also try disabling sequential scan to see how long Query 2 would
be if you used indexing. Remember, though, that because of caching, a
repeated index scan may seem faster, but in actual production, that
index may not be cached, depending on what other queries are done.
John
=:-
For completeness sake I also did a select ordernumber without any ordering. 
That only took 98 second for 1104379 record (11222 record per second, 
compariable with the first query as I would have expected).
Query 1:
select a.ordernumer from orders a order by a.zipcode limit 1
Explain:
QUERY PLAN
Limit  (cost=0.00..39019.79 rows=1 width=14)
 -  Index Scan using orders_postcode on orders a  (cost=0.00..4309264.07 
rows=1104379 width=14)
Running time: 1 second
Query 2:
select a.ordernumer from orders a order by a.zipcode limit 10
Explain:
QUERY PLAN
Limit  (cost=207589.75..207839.75 rows=10 width=14)
 -  Sort  (cost=207589.75..210350.70 rows=1104379 width=14)
   Sort Key: postcode
   -  Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=14)
Running time: 64 seconds
Query 3:
select a.ordernumer from orders a
QUERY PLAN
Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=4)
Running time: 98 seconds
Groeten,
Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl
---(end of broadcast)---
TIP 8: explain analyze is your friend




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Cosimo Streppone
Merlin Moncure wrote:
Corollary: use pl/pgsql.  It can be 10 times or more faster than query
by query editing.
Merlin, thanks for your good suggestions.
By now, our system has never used stored procedures approach,
due to the fact that we're staying on the minimum common SQL features
that are supported by most db engines.
I realize though that it would provide an heavy performance boost.
You also have the parse/bind interface
This is something I have already engineered in our core classes
(that use DBI + DBD::Pg), so that switching to 8.0 should
automatically enable the single-prepare, multiple-execute behavior,
saving a lot of query planner processing, if I understand correctly.
--
Cosimo
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Cosimo Streppone
Alex Turner wrote:
To be honest I've used compaq, dell and LSI SCSI RAID controllers and
got pretty pathetic benchmarks from all of them.
I also have seen average-low results for LSI (at least the 1020 card).
2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives:
2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10.  2x3ware (now AMCC)
Escalade 9500S-8MI.
Thanks, this is precious information.
I would be interested in starting a site listing RAID benchmarks under
linux.  If anyone is interested let me know.  I would be interested in
at least some bonnie++ benchmarks, and perhaps other if people would
like.
I have used also tiobench [http://tiobench.sourceforge.net/]
Any experience with it?
--
Cosimo
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] horizontal partition

2005-02-01 Thread Gaetano Mendola
Hi all,
I have a big table with ~ 10 Milion rows, and is a very
pain administer it, so after years I convinced my self
to partition it and replace the table usage ( only for reading )
with a view.
Now my user_logs table is splitted in 4:
user_logs
user_logs_2002
user_logs_2003
user_logs_2004
and the view v_user_logs is builded on top of these tables:
CREATE OR REPLACE VIEW v_user_logs AS
  SELECT * FROM user_logs
  UNION ALL
  SELECT * FROM user_logs_2002
  UNION ALL
  SELECT * FROM user_logs_2003
  UNION ALL
  SELECT * FROM user_logs_2004
;
the view is performing really well:
empdb=# explain analyze select * from v_user_logs where id_user = 
sp_id_user('kalman');
 
QUERY PLAN

 Subquery Scan v_user_logs  (cost=0.00..895.45 rows=645 width=88) (actual 
time=17.039..2345.388 rows=175 loops=1)
   -  Append  (cost=0.00..892.23 rows=645 width=67) (actual 
time=17.030..2344.195 rows=175 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..120.70 rows=60 width=67) 
(actual time=17.028..17.036 rows=1 loops=1)
   -  Index Scan using idx_user_user_logs on user_logs  
(cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1)
 Index Cond: (id_user = 4185)
 -  Subquery Scan *SELECT* 2  (cost=0.00..475.44 rows=316 width=67) 
(actual time=49.406..1220.400 rows=79 loops=1)
   -  Index Scan using idx_user_user_logs_2004 on user_logs_2004  
(cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 
loops=1)
 Index Cond: (id_user = 4185)
 -  Subquery Scan *SELECT* 3  (cost=0.00..204.33 rows=188 width=67) 
(actual time=59.375..1068.806 rows=95 loops=1)
   -  Index Scan using idx_user_user_logs_2003 on user_logs_2003  
(cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 
loops=1)
 Index Cond: (id_user = 4185)
 -  Subquery Scan *SELECT* 4  (cost=0.00..91.75 rows=81 width=67) 
(actual time=37.623..37.623 rows=0 loops=1)
   -  Index Scan using idx_user_user_logs_2002 on user_logs_2002  
(cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1)
 Index Cond: (id_user = 4185)
 Total runtime: 2345.917 ms
(15 rows)

the problem is now if this view is used in others views like this:

CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS
  SELECT
 u.login,
 ul.*
  FROM user_login  u,
   v_user_logs ul
  WHERE
   u.id_user = ul.id_user
;

empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 
'kalman';
 QUERY PLAN
-
 Hash Join  (cost=4.01..228669.81 rows=173 width=100) (actual 
time=1544.784..116490.363 rows=175 loops=1)
   Hash Cond: (outer.id_user = inner.id_user)
   -  Subquery Scan ul  (cost=0.00..193326.71 rows=7067647 width=88) (actual 
time=5.677..108190.096 rows=7067831 loops=1)
 -  Append  (cost=0.00..157988.47 rows=7067647 width=67) (actual 
time=5.669..77109.995 rows=7067831 loops=1)
   -  Subquery Scan *SELECT* 1  (cost=0.00..8158.48 rows=362548 
width=67) (actual time=5.666..3379.178 rows=362862 loops=1)
 -  Seq Scan on user_logs  (cost=0.00..6345.74 rows=362548 
width=67) (actual time=5.645..1395.673 rows=362862 loops=1)
   -  Subquery Scan *SELECT* 2  (cost=0.00..93663.88 
rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1)
 -  Seq Scan on user_logs_2004  (cost=0.00..72705.94 
rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1)
   -  Subquery Scan *SELECT* 3  (cost=0.00..44875.33 
rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1)
 -  Seq Scan on user_logs_2003  (cost=0.00..34834.17 
rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1)
   -  Subquery Scan *SELECT* 4  (cost=0.00..11290.78 rows=505278 
width=67) (actual time=7.100..3636.163 rows=505199 loops=1)
 -  Seq Scan on user_logs_2002  (cost=0.00..8764.39 
rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1)
   -  Hash  (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 
loops=1)
 -  Index Scan using user_login_login_key on user_login u  
(cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
   Index Cond: ((login)::text = 'kalman'::text)
 Total runtime: 116491.056 ms
(16 rows)

as you can see the index scan is not used anymore.
Do 

Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread William Yu
Jim C. Nasby wrote:
On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote:
You might look at Opteron's, which theoretically have a higher data
bandwidth. If you're doing anything data intensive, like a sort in
memory, this could make a difference.
Would Opteron systems need 64-bit postgresql (and os, gcc, ...)
build to have that advantage?
 
Well, that would give you the most benefit, but the memory bandwidth is
still greater than on a Xeon. There's really no issue with 64 bit if
you're using open source software; it all compiles for 64 bits and
you're good to go. http://stats.distributed.net runs on a dual opteron
box running FreeBSD and I've had no issues.
You can get 64-bit Xeons also but it takes hit in the I/O department due 
to the lack of a hardware I/O MMU which limits DMA transfers to 
addresses below 4GB. This has a two-fold impact:

1) transfering data to 4GB require first a transfer to 4GB and then a 
copy to the final destination.

2) You must allocate real memory 2X the address space of the devices to 
act as bounce buffers. This is especially problematic for workstations 
because if you put a 512MB Nvidia card in your computer for graphics 
work -- you've just lost 1GB of memory. (I dunno how much the typical 
SCSI/NIC/etc take up.)

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Alex Turner
None - but I'll definately take a look..

Alex Turner
NetEconomist


On Tue, 01 Feb 2005 22:11:30 +0100, Cosimo Streppone
[EMAIL PROTECTED] wrote:
 Alex Turner wrote:
 
  To be honest I've used compaq, dell and LSI SCSI RAID controllers and
  got pretty pathetic benchmarks from all of them.
 
 I also have seen average-low results for LSI (at least the 1020 card).
 
  2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives:
  2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10.  2x3ware (now AMCC)
  Escalade 9500S-8MI.
 
 Thanks, this is precious information.
 
  I would be interested in starting a site listing RAID benchmarks under
  linux.  If anyone is interested let me know.  I would be interested in
  at least some bonnie++ benchmarks, and perhaps other if people would
  like.
 
 I have used also tiobench [http://tiobench.sourceforge.net/]
 Any experience with it?
 
 --
 Cosimo
 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly