Re: [PERFORM] Reliability recommendations

2006-02-25 Thread Ron

At 11:41 PM 2/24/2006, Luke Lonergan wrote:

Dan,

On 2/24/06 4:47 PM, Dan Gorman [EMAIL PROTECTED] wrote:

 Was that sequential reads? If so, yeah you'll get 110MB/s? How big
 was the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they
 can't sustain that. There are so many details missing from this test
 that it's hard to have any context around it :)

 I was getting about 40-50MB/s on a PV with 14 disks on a RAID10 in
 real world usage. (random IO and fully saturating a Dell 1850 with 4
 concurrent threads (to peg the cpu on selects) and raw data files)

OK, how about some proof?

In a synthetic test that writes 32GB of sequential 8k pages on a machine
with 16GB of RAM:
= Write test results ==
time bash -c dd if=/dev/zero of=/dbfast1/llonergan/bigfile bs=8k
count=200  sync 
time bash -c dd if=/dev/zero of=/dbfast3/llonergan/bigfile bs=8k
count=200  sync 

200++0 records in
200++0 records out
200++0 records in
200++0 records out

real1m0.046s
user0m0.270s
sys 0m30.008s

real1m0.047s
user0m0.287s
sys 0m30.675s

So that's 32,000 MB written in 60.05 seconds, which is 533MB/s sustained
with two threads.

Now to read the same files in parallel:
= Read test results ==
sync
time dd of=/dev/null if=/dbfast1/llonergan/bigfile bs=8k 
time dd of=/dev/null if=/dbfast3/llonergan/bigfile bs=8k 

200++0 records in
200++0 records out

real0m39.849s
user0m0.282s
sys 0m22.294s
200++0 records in
200++0 records out

real0m40.410s
user0m0.251s
sys 0m22.515s

And that's 32,000MB in 40.4 seconds, or 792MB/s sustained from disk (not
memory).

These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
controllers.


Impressive IO rates.  A more detailed HW list would help put them in context.

Which 3Ware?  The 9550SX?  How much cache on it (AFAIK, the only 
options are 128MB and 256MB?)?


Which HDs?

What CPUs (looks like Opterons, but which flavor?) and mainboard?

What's CPU utilization when hammering the physical IO subsystem this hard?



Now for real usage, let's run a simple sequential scan query on 123,434 MB
of data in a single table on 4 of these machines in parallel.  All tables
are distributed evenly by Bizgres MPP over all 8 filesystems:

= Bizgres MPP sequential scan results =

[EMAIL PROTECTED] +AH4]$ !psql
psql -p  -U mppdemo1 demo
Welcome to psql 8.1.1 (server 8.1.3), the PostgreSQL interactive terminal.

Type:  +AFw-copyright for distribution terms
   +AFw-h for help with SQL commands
   +AFw? for help with psql commands
   +AFw-g or terminate with semicolon to execute query
   +AFw-q to quit

demo=# +AFw-timing
Timing is on.
demo=# select version();

version


-
 PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by
GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006
11:34:06
(1 row)

Time: 0.570 ms
demo=# select relname,8*relpages/128 as MB from pg_class order by relpages
desc limit 6;
relname |   mb
++
 lineitem   | 123434
 orders |  24907
 partsupp   |  14785
 part   |   3997
 customer   |   3293
 supplier   |202
(6 rows)

Time: 1.824 ms
demo=# select count(*) from lineitem;
   count
---
 600037902
(1 row)

Time: 60300.960 ms

So that's 123,434 MB of data scanned in 60.3 seconds, or 2,047 MB/s on 4
machines, which uses 512MB/s of disk bandwidth on each machine.

Now let's do a query that uses a this big table (a two way join) using all 4
machines:
= Bizgres MPP Query results =
demo=# select
demo-# sum(l_extendedprice* (1 - l_discount)) as revenue
demo-# from
demo-# lineitem,
demo-# part
demo-# where
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#42'
demo(# and p_container in ('SM CASE', 'SM BOX', 'SM PACK',
'SM PKG')
demo(# and l_quantity = 7 and l_quantity = 7 ++ 10
demo(# and p_size between 1 and 5
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# )
demo-# or
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#15'
demo(# and p_container in ('MED BAG', 'MED BOX', 'MED PKG',
'MED PACK')
demo(# and l_quantity = 14 and l_quantity = 14 ++ 10
demo(# and p_size between 1 and 10
demo(#

Re: [PERFORM] Reliability recommendations

2006-02-25 Thread Luke Lonergan
Ron,

On 2/25/06 3:24 AM, Ron [EMAIL PROTECTED] wrote:

 These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
 controllers.
 
 Impressive IO rates.  A more detailed HW list would help put them in context.
 
 Which 3Ware?  The 9550SX?  How much cache on it (AFAIK, the only
 options are 128MB and 256MB?)?
 
 Which HDs?
 
 What CPUs (looks like Opterons, but which flavor?) and mainboard?
 
 What's CPU utilization when hammering the physical IO subsystem this hard?

OK.  There are four machines. Each machine has:
Qty 2 of 3Ware 9550SX/PCIX/128MB cache SATAII RAID controllers
Qty 2 of AMD Opteron 250 CPUs (2.4 GHz)
Qty 16 of 1GB PC3200 RAM (16GB total)
Qty 1 of Tyan 2882S Motherboard
Qty 16 Western Digital 400GB Raid Edition 2 SATA disks
Cost: About $10,000 each

They are connected together using a Netgear 48 port Gigabit Ethernet switch
and copper CAT6 cables.
Cost: About $1,500

Total of all machines:
8 CPUs
64GB RAM
64 Disks
20TB storage in RAID5
Total HW cost: About $45,000.

CPU utilization is apparently (adding up usr+system and dividing by real):
Writing at 2,132MB/s: 51%
Reading at 3,168MB/s: 56%

 revenue
 
  356492404.3164
 (1 row)
 
 Time: 114908.149 ms
 Hmmm.  ~115secs @ ~500MBps = ~57.5GB of data manipulated.

Actually, this query sequential scans all of both lineitem and part, so it¹s
accessing 127.5GB of data, and performing the work of the hash join, in
about double the scan time.  That¹s possible because we¹re using all 8 CPUs,
4 network interfaces and the 64 disks while we are performing the query:

 Aggregate  (cost=3751996.97..3751996.99 rows=1 width=22)
   -  Gather Motion  (cost=3751996.97..3751996.99 rows=1 width=22)
 -  Aggregate  (cost=3751996.97..3751996.99 rows=1 width=22)
   -  Hash Join  (cost=123440.49..3751993.62 rows=1339
width=22)
 Hash Cond: (outer.l_partkey = inner.p_partkey)
 Join Filter: (((inner.p_brand = 'Brand#42'::bpchar)
AND ((inner.p_container = 'SM CASE'::bpchar) OR (inner.p_container = 'SM
BOX'::bpchar) OR (inner.p_container = 'SM PACK'::bpchar) OR
(inner.p_container = 'SM PKG'::bpchar)) AND (outer.l_quantity =
7::numeric) AND (outer.l_quantity = 17::numeric) AND (inner.p_size =
5)) OR ((inner.p_brand = 'Brand#15'::bpchar) AND ((inner.p_container =
'MED BAG'::bpchar) OR (inner.p_container = 'MED BOX'::bpchar) OR
(inner.p_container = 'MED PKG'::bpchar) OR (inner.p_container = 'MED
PACK'::bpchar)) AND (outer.l_quantity = 14::numeric) AND
(outer.l_quantity = 24::numeric) AND (inner.p_size = 10)) OR
((inner.p_brand = 'Brand#53'::bpchar) AND ((inner.p_container = 'LG
CASE'::bpchar) OR (inner.p_container = 'LG BOX'::bpchar) OR
(inner.p_container = 'LG PACK'::bpchar) OR (inner.p_container = 'LG
PKG'::bpchar)) AND (outer.l_quantity = 22::numeric) AND
(outer.l_quantity = 32::numeric) AND (inner.p_size = 15)))
 -  Redistribute Motion  (cost=0.00..3340198.25
rows=2611796 width=36)
   Hash Key: l_partkey
   -  Seq Scan on lineitem  (cost=0.00..3287962.32
rows=2611796 width=36)
 Filter: (((l_shipmode = 'AIR'::bpchar) OR
(l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN
PERSON'::bpchar))
 -  Hash  (cost=95213.75..95213.75 rows=2500296
width=36)
   -  Seq Scan on part  (cost=0.00..95213.75
rows=2500296 width=36)
 Filter: (p_size = 1)
(13 rows)
 
 Time: 424981.813 ms
 ...and this implies ~425secs @ ~500MBps = 212.5GB

I've attached this explain plan because it's large.  Same thing applies as
in previous, all of the involved tables are scanned, and in this case we've
got all manner of CPU work being performed: sorts, hashes, aggregations,...

 What are the IO rates during these joins?

They burst then disappear during the progression of the queries.  These are
now CPU bound because the I/O available to each CPU is so fast.

 How much data is being handled to complete these joins?

See above, basically all of it because there are no indexes.
 
 How much data is being exchanged between these machines to complete the joins?

Good question, you can infer it from reading the EXPLAIN plans, but the
stats are not what they appear - generally multiply them by 8.
 
 What is the connectivity between these 4 machines?

See above.
 
 Putting these numbers in context may help the advocacy effort
 considerably as well as help us improve things even further. ;-)

Thanks!  (bait bait) I can safely say that there have never been query
speeds with Postgres this fast, not even close. (bait bait)

What's next is a machine 4 or maybe 8 times this fast.

Explain for second query is attached, compressed with gzip (it¹s kinda
huge).

- Luke




q2-explain.txt.gz
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Bruce Momjian
Christopher Browne wrote:
 After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Joshua D. 
 Drake) belched out:
  Jeremy Haile wrote:
  We are a small company looking to put together the most cost effective
  solution for our production database environment.  Currently in
  production Postgres 8.1 is running on this machine:
 
  Dell 2850
  2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
  4 GB DDR2 400 Mhz
  2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
  4 x 146 GB 10K SCSI RAID 10 (for postgres data)
  Perc4ei controller
 
  The above is a standard Dell box with nothing added or modified beyond
  the options available directly through Dell.
 
  You should probably review the archives for PostgreSQL user
  experience with Dell's before you purchase one.
 
 Hear, hear!  We found Dell servers were big-time underperformers.
 
 Generic hardware put together with generally the same brand names of
 components (e.g. - for SCSI controllers and such) would generally play
 much better.
 
 For the cheapo desktop boxes they obviously have to buy the cheapest
 hardware available this week; it sure seems as though they engage in
 the same sort of thing with the server class hardware.
 
 I don't think anyone has been able to forcibly point out any
 completely precise shortcoming; just that they underperform what the
 specs suggest they ought to be able to provide.

Dell often says part X is included, but part X is not the exact same as
part X sold by the original manufacturer.  To hit a specific price
point, Dell is willing to strip thing out of commodity hardware, and
often does so even when performance suffers.  For many people, this is
unacceptable.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Craig A. James

Bruce Momjian wrote:

Dell often says part X is included, but part X is not the exact same as
part X sold by the original manufacturer.  To hit a specific price
point, Dell is willing to strip thing out of commodity hardware, and
often does so even when performance suffers.  For many people, this is
unacceptable.


I find this strains credibility, that this major manufacturer of PC's would do something 
deceptive that hurts performance, when it would be easily detected and widely reported.  
Can anyone cite a specific instances where this has happened?  Such as, I bought 
Dell model XYZ, which was advertised to have these parts and these specs, but in fact had 
these other parts and here are the actual specs.

Dell seems to take quite a beating in this forum, and I don't recall seeing any other 
manufacturer blasted this way.  Is it that they are deceptive, or simply that their 
servers are designed to be office servers, not database servers?

There's nothing wrong with Dell designing their servers for a different market 
than ours; they need to go for the profits, and that may not include us.  But 
it's not fair for us to claim Dell is being deceptive unless we have concrete 
evidence.

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Luke Lonergan
Bruce,

On 2/24/06 7:14 AM, Luke Lonergan [EMAIL PROTECTED] wrote:

 So, if you want RAID5, these machines work for me.  The lack of RAID 10
 could knock them out of contention for people.

Sorry in advance for the double post, but there's some more information on
this, which altogether demonstrates why people get frustrated with Dell IMO.

Later on in the article I cited, the same person who listed Dell's technical
description of their strange, not really RAID10 support, he mentions that he
called LSI directly to find out the real scoop.  He says that they (LSI)
told him that the LSI controller does support real RAID10, so he concluded
that the Dell could do it too.

However, Dell's documentation seems unambiguous to me, and matches our
direct experience.  Also, more online documentation from Dell reinforces
this.  See their definition of both RAID10 and spanning in this
backgrounder: 
http://support.dell.com/support/edocs/storage/RAID/RAIDbk0.pdf.  Seems that
they are clear about what striping and spanning do, and what RAID10 does,
making their delineation of the standard RAID10 support versus what some of
their controllers do (all of the PERC 4 series) pretty clear.

So, I'd conclude at this point that Dell seems to have implemented a RAID
BIOS that does not allow true RAID10 or RAID50 on their embedded LSI
adapters, which could otherwise support RAID10/50.  They have done this
intentionally, for some reason unknown to me and it seems to some people at
LSI as well.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Scott Marlowe
On Fri, 2006-02-24 at 10:27, Vivek Khera wrote:
 On Feb 24, 2006, at 9:29 AM, Bruce Momjian wrote:
 
  Dell often says part X is included, but part X is not the exact  
  same as
  part X sold by the original manufacturer.  To hit a specific price
  point, Dell is willing to strip thing out of commodity hardware, and
  often does so even when performance suffers.  For many people, this is
  unacceptable.
 
 The last dell box I bought, a PE1850, came with a PERC 4e/Si card,  
 which I believe is the same as the card the OP was looking at.  It is  
 very fast in RAID1 with two U320 disks.

My bad experiences were with the 2600 series machines.  We now have some
2800 and they're much better than the 2600/2650s I've used in the past.

That said, I've not tried to do anything too high end with them, like
RAID 1+0 or anything.

My past experience has been that Intel and SuperMicro make much better
white boxes you can get from almost any halfway decent local
supplier.  They're faster, more reliable, and the parts are easy to get.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Vivek Khera


On Feb 24, 2006, at 11:32 AM, Scott Marlowe wrote:

My bad experiences were with the 2600 series machines.  We now have  
some
2800 and they're much better than the 2600/2650s I've used in the  
past.


Yes, the 2450 and 2650 were CRAP disk performers.  I haven't any 2850  
to compare, just an 1850.




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Scott Marlowe
On Fri, 2006-02-24 at 10:40, Vivek Khera wrote:
 On Feb 24, 2006, at 11:32 AM, Scott Marlowe wrote:
 
  My bad experiences were with the 2600 series machines.  We now have  
  some
  2800 and they're much better than the 2600/2650s I've used in the  
  past.
 
 Yes, the 2450 and 2650 were CRAP disk performers.  I haven't any 2850  
 to compare, just an 1850.
 
And the real problem with the 2650s we have now is that under very heavy
load, they just lock up.  all of them, latest BIOS updates, etc... 
nothing makes them stable.  Some things make them a little less dodgy,
but they are never truly reliable.  Something the Intel and Supermicro
white boxes have always been for me.

I don't want support from a company like Dell, I want a reliable
machine.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Joshua D. Drake


I find this strains credibility, that this major manufacturer of PC's 
would do something deceptive that hurts performance, when it would be 
easily detected and widely reported.  Can anyone cite a specific 
instances where this has happened?  Such as, I bought Dell model XYZ, 
which was advertised to have these parts and these specs, but in fact 
had these other parts and here are the actual specs.

I can :)

Feb 20 07:33:52 master kernel: [4294682.803000]   Vendor: MegaRAID  
Model: LD 0 RAID1   51G  Rev: 196T
Feb 20 07:33:52 master kernel: [4294682.803000]   Type:   
Direct-Access  ANSI SCSI revision: 02

Feb 20 07:33:52 master kernel: [4294682.818000] tg3.c:v3.31 (June 8, 2005)
Feb 20 07:33:52 master kernel: [4294682.818000] ACPI: PCI Interrupt 
:0a:01.0[A] - GSI 17 (level, low) - IRQ 17
Feb 20 07:33:52 master kernel: [4294683.51] eth0: Tigon3 
[partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 
10/100/1000BaseT Ethernet 00:0f:1f:6e:01:f
Feb 20 07:33:52 master kernel: [4294683.51] eth0: RXcsums[1] 
LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[0] TSOcap[0]

Feb 20 07:33:52 master kernel: [4294683.51] eth0: dma_rwctrl[76ff000f]
Feb 20 07:33:52 master kernel: [4294683.51] ACPI: PCI Interrupt 
:0a:02.0[A] - GSI 18 (level, low) - IRQ 18
Feb 20 07:33:52 master kernel: [4294684.203000] eth1: Tigon3 
[partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 
10/100/1000BaseT Ethernet 00:0f:1f:6e:01:f
Feb 20 07:33:52 master kernel: [4294684.203000] eth1: RXcsums[1] 
LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[0] TSOcap[0]

Feb 20 07:33:52 master kernel: [4294684.203000] eth1: dma_rwctrl[76ff000f]
Feb 20 07:33:52 master kernel: [4294686.228000] SCSI device sda: 
106168320 512-byte hdwr sectors (54358 MB)
Feb 20 07:33:52 master kernel: [4294686.228000] SCSI device sda: 
106168320 512-byte hdwr sectors (54358 MB)
Feb 20 07:33:52 master kernel: [4294686.228000]  
/dev/scsi/host0/bus2/target0/lun0: p1 p2  p5  p3
Feb 20 07:33:52 master kernel: [4294686.243000] Attached scsi disk sda 
at scsi0, channel 2, id 0, lun 0
Feb 20 07:33:52 master kernel: [4294686.578000] ACPI: CPU0 (power 
states: C1[C1])

Feb 20 07:33:52 master kernel: [4294686.655000] Attempting manual resume
Feb 20 07:33:52 master kernel: [4294686.659000] swsusp: Suspend 
partition has wrong signature?
Feb 20 07:33:52 master kernel: [4294686.671000] kjournald starting.  
Commit interval 5 seconds
Feb 20 07:33:52 master kernel: [4294686.671000] EXT3-fs: mounted 
filesystem with ordered data mode.
Feb 20 07:33:52 master kernel: [4294687.327000] md: md driver 0.90.1 
MAX_MD_DEVS=256, MD_SB_DISKS=27
Feb 20 07:33:52 master kernel: [4294688.633000] Adding 3903784k swap on 
/dev/sda5.  Priority:-1 extents:1
Feb 20 07:33:52 master kernel: [4294688.705000] EXT3 FS on sda3, 
internal journal
Feb 20 07:33:52 master kernel: [4294692.533000] lp: driver loaded but no 
devices found
Feb 20 07:33:52 master kernel: [4294692.557000] mice: PS/2 mouse device 
common for all mice
Feb 20 07:33:52 master kernel: [4294695.263000] device-mapper: 
4.4.0-ioctl (2005-01-12) initialised: [EMAIL PROTECTED]
Feb 20 07:33:52 master kernel: [4294695.479000] kjournald starting.  
Commit interval 5 seconds
Feb 20 07:33:52 master kernel: [4294695.479000] EXT3 FS on sda1, 
internal journal
Feb 20 07:33:52 master kernel: [4294695.479000] EXT3-fs: mounted 
filesystem with ordered data mode.
Feb 20 07:33:52 master kernel: [4294695.805000] Linux agpgart interface 
v0.101 (c) Dave Jones
Feb 20 07:33:52 master kernel: [4294696.071000] piix4_smbus 
:00:0f.0: Found :00:0f.0 device
Feb 20 07:33:52 master kernel: [4294696.584000] pci_hotplug: PCI Hot 
Plug PCI Core version: 0.5



processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) MP CPU 2.20GHz
stepping: 6
cpu MHz : 2194.056
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr

bogomips: 4325.37



This machine... if you run it in raid 5 will only get 7-9 megabytes a 
second READ! performance. That is with 6 SCSI drives.
If you run it in RAID 10 you get a more reasonable 50-55 megabytes per 
second.


I don't have it sitting in front of me or I would give you an exact 
model number.


This machine also uses the serverworks chipset which is known to be a 
catastrophe.


Joshua D. Drake






Dell seems to take quite a beating in this forum, and I don't recall 
seeing any other manufacturer blasted this way.  Is it that they are 
deceptive, or simply that their servers are designed to be office 
servers, not database servers?


There's nothing wrong with Dell designing their servers for a 

Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Craig A. James

Joshua D. Drake wrote:
I find this strains credibility, that this major manufacturer of PC's 
would do something deceptive that hurts performance, when it would be 
easily detected and widely reported.  Can anyone cite a specific 
instances where this has happened?  Such as, I bought Dell model XYZ, 
which was advertised to have these parts and these specs, but in fact 
had these other parts and here are the actual specs.


I can :)

Feb 20 07:33:52 master kernel: [4294682.803000]   Vendor: MegaRAID  
Model: LD 0 RAID1   51G  Rev: 196T

--- snip ---
This machine... if you run it in raid 5 will only get 7-9 megabytes a 
second READ! performance. That is with 6 SCSI drives.
If you run it in RAID 10 you get a more reasonable 50-55 megabytes per 
second.


But you don't say how this machine was advertised.  Are there components in 
that list that were not as advertised?  Was the machine advertised as capable 
of RAID 5?  Were performance figures published for RAID 5?

If Dell advertised that the machine could do what you asked, then you're right 
-- they screwed you.  But if it was designed for and advertised to a different 
market, then I've made my point: People are blaming Dell for something that's 
not their fault.

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Luke Lonergan
Joshua,

On 2/24/06 9:19 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 This machine... if you run it in raid 5 will only get 7-9 megabytes a
 second READ! performance. That is with 6 SCSI drives.
 If you run it in RAID 10 you get a more reasonable 50-55 megabytes per
 second.
 
 I don't have it sitting in front of me or I would give you an exact
 model number.
 
 This machine also uses the serverworks chipset which is known to be a
 catastrophe.

I'd be more shocked if this weren't also true of nearly all SCSI HW RAID
adapters of this era.  If you had ordered an HP DL380 server you'd get about
the same performance.

BTW - I don't think there's anything reasonable about 50-55 MB/s from 6
disks, I'd put the minimum for this era machine at 5 x 30 = 150MB/s.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Philippe Marzin




Do you have a hw reference that runs that fast (5 x 30 = 150MB/s) ?


Luke Lonergan a crit:

  Joshua,

On 2/24/06 9:19 AM, "Joshua D. Drake" [EMAIL PROTECTED] wrote:

  
  
This machine... if you run it in raid 5 will only get 7-9 megabytes a
second READ! performance. That is with 6 SCSI drives.
If you run it in RAID 10 you get a more reasonable 50-55 megabytes per
second.

I don't have it sitting in front of me or I would give you an exact
model number.

This machine also uses the serverworks chipset which is known to be a
catastrophe.

  
  
I'd be more shocked if this weren't also true of nearly all SCSI HW RAID
adapters of this era.  If you had ordered an HP DL380 server you'd get about
the same performance.

BTW - I don't think there's anything reasonable about 50-55 MB/s from 6
disks, I'd put the minimum for this era machine at 5 x 30 = 150MB/s.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



  





Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Dan Gorman

All,

Was that sequential reads? If so, yeah you'll get 110MB/s? How big  
was the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they  
can't sustain that. There are so many details missing from this test  
that it's hard to have any context around it :)


I was getting about 40-50MB/s on a PV with 14 disks on a RAID10 in  
real world usage. (random IO and fully saturating a Dell 1850 with 4  
concurrent threads (to peg the cpu on selects) and raw data files)


Best Regards,
Dan Gorman

On Feb 24, 2006, at 4:29 PM, Mark Kirkwood wrote:


Luke Lonergan wrote:

I'd be more shocked if this weren't also true of nearly all SCSI  
HW RAID
adapters of this era.  If you had ordered an HP DL380 server you'd  
get about

the same performance.
BTW - I don't think there's anything reasonable about 50-55 MB/s  
from 6

disks, I'd put the minimum for this era machine at 5 x 30 = 150MB/s.


He was quoting for 6 disk RAID 10 - I'm thinking 3 x 30MB/s = 90MB/ 
s is probably more correct? Having aid that, your point is still  
completely correct - the performance @55MB/s is poor (e.g. my *ata*  
system with 2 disk RAID0 does reads @110MB/s).


cheers

Mark

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Mark Kirkwood

Dan Gorman wrote:

All,

Was that sequential reads? If so, yeah you'll get 110MB/s? How big  was 
the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they  can't 
sustain that. There are so many details missing from this test  that 
it's hard to have any context around it :)




Actually they can. Datafile size was 8G, machine had 2G RAM (i.e. 
datafile 4 times memory). The test was for a sequential read with 8K 
blocks. I believe this is precisely the type of test that the previous 
posters were referring to - while clearly, its not a real-world measure, 
we are comparing like to like, and as such terrible results on such a 
simple test are indicative of something 'not right'.


regards

Mark

P.s. FWIW - I'm quoting a test from a few years ago - the (same) machine 
now has 4 RAID0 ata disks and does 175MB/s on the same test





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Luke Lonergan
Dan,

On 2/24/06 4:47 PM, Dan Gorman [EMAIL PROTECTED] wrote:

 Was that sequential reads? If so, yeah you'll get 110MB/s? How big
 was the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they
 can't sustain that. There are so many details missing from this test
 that it's hard to have any context around it :)
 
 I was getting about 40-50MB/s on a PV with 14 disks on a RAID10 in
 real world usage. (random IO and fully saturating a Dell 1850 with 4
 concurrent threads (to peg the cpu on selects) and raw data files)

OK, how about some proof?

In a synthetic test that writes 32GB of sequential 8k pages on a machine
with 16GB of RAM:
= Write test results ==
time bash -c dd if=/dev/zero of=/dbfast1/llonergan/bigfile bs=8k
count=200  sync 
time bash -c dd if=/dev/zero of=/dbfast3/llonergan/bigfile bs=8k
count=200  sync 

200 records in
200 records out
200 records in
200 records out

real1m0.046s
user0m0.270s
sys 0m30.008s

real1m0.047s
user0m0.287s
sys 0m30.675s

So that's 32,000 MB written in 60.05 seconds, which is 533MB/s sustained
with two threads.

Now to read the same files in parallel:
= Read test results ==
sync
time dd of=/dev/null if=/dbfast1/llonergan/bigfile bs=8k 
time dd of=/dev/null if=/dbfast3/llonergan/bigfile bs=8k 

200 records in
200 records out

real0m39.849s
user0m0.282s
sys 0m22.294s
200 records in
200 records out

real0m40.410s
user0m0.251s
sys 0m22.515s

And that's 32,000MB in 40.4 seconds, or 792MB/s sustained from disk (not
memory).

These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
controllers.

Now for real usage, let's run a simple sequential scan query on 123,434 MB
of data in a single table on 4 of these machines in parallel.  All tables
are distributed evenly by Bizgres MPP over all 8 filesystems:

= Bizgres MPP sequential scan results =

[EMAIL PROTECTED] ~]$ !psql
psql -p  -U mppdemo1 demo
Welcome to psql 8.1.1 (server 8.1.3), the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

demo=# \timing
Timing is on.
demo=# select version();
   
version


-
 PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by
GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006
11:34:06
(1 row)

Time: 0.570 ms
demo=# select relname,8*relpages/128 as MB from pg_class order by relpages
desc limit 6;
relname |   mb
---
 lineitem   | 123434
 orders |  24907
 partsupp   |  14785
 part   |   3997
 customer   |   3293
 supplier   |202
(6 rows)

Time: 1.824 ms
demo=# select count(*) from lineitem;
   count   
---
 600037902
(1 row)

Time: 60300.960 ms

So that's 123,434 MB of data scanned in 60.3 seconds, or 2,047 MB/s on 4
machines, which uses 512MB/s of disk bandwidth on each machine.

Now let's do a query that uses a this big table (a two way join) using all 4
machines:
= Bizgres MPP Query results =
demo=# select
demo-# sum(l_extendedprice* (1 - l_discount)) as revenue
demo-# from
demo-# lineitem,
demo-# part
demo-# where   
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#42'
demo(# and p_container in ('SM CASE', 'SM BOX', 'SM PACK',
'SM PKG')
demo(# and l_quantity = 7 and l_quantity = 7  10
demo(# and p_size between 1 and 5
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# )   
demo-# or  
demo-# (   
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#15'
demo(# and p_container in ('MED BAG', 'MED BOX', 'MED PKG',
'MED PACK')
demo(# and l_quantity = 14 and l_quantity = 14  10
demo(# and p_size between 1 and 10
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# )
demo-# or
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#53'
demo(# and p_container in ('LG CASE', 'LG BOX', 'LG PACK',
'LG PKG')
demo(# and l_quantity = 22 and 

Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Mark Kirkwood
Luke Lonergan wrote:

 
 OK, how about some proof?
 
 In a synthetic test that writes 32GB of sequential 8k pages on a machine
 with 16GB of RAM:
 = Write test results ==
 time bash -c dd if=/dev/zero of=/dbfast1/llonergan/bigfile bs=8k
 count=200  sync 
 time bash -c dd if=/dev/zero of=/dbfast3/llonergan/bigfile bs=8k
 count=200  sync 
 
 200 records in
 200 records out
 200 records in
 200 records out
 
 real1m0.046s
 user0m0.270s
 sys 0m30.008s
 
 real1m0.047s
 user0m0.287s
 sys 0m30.675s
 
 So that's 32,000 MB written in 60.05 seconds, which is 533MB/s sustained
 with two threads.
 

Well, since this is always fun (2G memory, 3Ware 7506, 4xPATA), writing:

$ dd if=/dev/zero of=/data0/dump/bigfile bs=8k count=50
50 records in
50 records out
409600 bytes transferred in 32.619208 secs (125570185 bytes/sec)

 Now to read the same files in parallel:
 = Read test results ==
 sync
 time dd of=/dev/null if=/dbfast1/llonergan/bigfile bs=8k 
 time dd of=/dev/null if=/dbfast3/llonergan/bigfile bs=8k 
 
 200 records in
 200 records out
 
 real0m39.849s
 user0m0.282s
 sys 0m22.294s
 200 records in
 200 records out
 
 real0m40.410s
 user0m0.251s
 sys 0m22.515s
 
 And that's 32,000MB in 40.4 seconds, or 792MB/s sustained from disk (not
 memory).
 

Reading:

$ dd of=/dev/null if=/data0/dump/bigfile bs=8k count=50
50 records in
50 records out
409600 bytes transferred in 24.067298 secs (170189442 bytes/sec)

Ok - didn't quite get my quoted 175MB/s, (FWIW if bs=32k I get exactly
175MB/s).

Hmmm - a bit humbled by Luke's machinery :-), however, mine is probably
competitive on (MB/s)/$


It would be interesting to see what Dan's system would do on a purely
sequential workload - as 40-50MB of purely random IO is high.

Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,


Hmmm - a bit humbled by Luke's machinery :-), however, mine is probably
competitive on (MB/s)/$



Not sure - the machines I cite are about $10K each.  The machine you tested
was probably about $1500 a few years ago (my guess), and with a 5:1 ratio in
speed versus about a 6:1 ratio in price, we're not too far off in MB/s/$
after all :-)
 


Wow - that is remarkable performance for $10K!

Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Reliability recommendations

2006-02-15 Thread Jeremy Haile
We are a small company looking to put together the most cost effective
solution for our production database environment.  Currently in
production Postgres 8.1 is running on this machine:

Dell 2850
2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
4 GB DDR2 400 Mhz
2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
4 x 146 GB 10K SCSI RAID 10 (for postgres data)
Perc4ei controller

The above is a standard Dell box with nothing added or modified beyond
the options available directly through Dell. We had a bad processor last
week that effectively put us down for an entire weekend. Though it was
the web server that failed, the experience has caused us to step back
and spend time coming up with a more reliable/fail-safe solution that
can reduce downtime.

Our load won't be substantial so extreme performance and load balancing
are not huge concerns. We are looking for good performance, at a good
price, configured in the most redundant, high availability manner
possible. Availability is the biggest priority.

I sent our scenario to our sales team at Dell and they came back with
all manner of SAN, DAS, and configuration costing as much as $50k.

We have the budget to purchase 2-3 additional machines along the lines
of the one listed above. As a startup with a limited budget, what would
this list suggest as options for clustering/replication or setting our
database up well in general?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Craig A. James

Jeremy Haile wrote:

We are a small company looking to put together the most cost effective
solution for our production database environment.  Currently in
production Postgres 8.1 is running on this machine:

Dell 2850
2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
4 GB DDR2 400 Mhz
2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
4 x 146 GB 10K SCSI RAID 10 (for postgres data)
Perc4ei controller

... I sent our scenario to our sales team at Dell and they came back with
all manner of SAN, DAS, and configuration costing as much as $50k.


Given what you've told us, a $50K machine is not appropriate.

Instead, think about a simple system with several clones of the database and a 
load-balancing web server, even if one machine could handle your load.  If a 
machine goes down, the load balancer automatically switches to the other.

Look at the MTBF figures of two hypothetical machines:

Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average.
Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on 
average.

Now go out and buy three of the $2,000 machines.  Use a load-balancer front end web 
server that can send requests round-robin fashion to a server farm.  Clone 
your database.  In fact, clone the load-balancer too so that all three machines have all 
software and databases installed.  Call these A, B, and C machines.

At any given time, your Machine A is your web front end, serving requests to 
databases on A, B and C.  If B or C goes down, no problem - the system keeps 
running.  If A goes down, you switch the IP address of B or C and make it your 
web front end, and you're back in business in a few minutes.

Now compare the reliability -- in order for this system to be disabled, you'd have to have ALL THREE computers fail at the same time.  With the MTBF and repair time of two days, each machine has a 99.726% uptime.  The MTBF, that is, the expected time until all three machines will fail simultaneously, is well over 100,000 years!  Of course, this is silly, machines don't last that long, but it illustrates the point:  Redundancy is beats reliability (which is why RAID is so useful). 


All for $6,000.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Mark Lewis
Machine 1: $2000
Machine 2: $2000
Machine 3: $2000

Knowing how to rig them together and maintain them in a fully fault-
tolerant way: priceless.


(Sorry for the off-topic post, I couldn't resist).

-- Mark Lewis

On Wed, 2006-02-15 at 09:19 -0800, Craig A. James wrote:
 Jeremy Haile wrote:
  We are a small company looking to put together the most cost effective
  solution for our production database environment.  Currently in
  production Postgres 8.1 is running on this machine:
  
  Dell 2850
  2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
  4 GB DDR2 400 Mhz
  2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
  4 x 146 GB 10K SCSI RAID 10 (for postgres data)
  Perc4ei controller
  
  ... I sent our scenario to our sales team at Dell and they came back with
  all manner of SAN, DAS, and configuration costing as much as $50k.
 
 Given what you've told us, a $50K machine is not appropriate.
 
 Instead, think about a simple system with several clones of the database and 
 a load-balancing web server, even if one machine could handle your load.  If 
 a machine goes down, the load balancer automatically switches to the other.
 
 Look at the MTBF figures of two hypothetical machines:
 
  Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average.
  Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on 
 average.
 
 Now go out and buy three of the $2,000 machines.  Use a load-balancer front 
 end web server that can send requests round-robin fashion to a server farm. 
  Clone your database.  In fact, clone the load-balancer too so that all three 
 machines have all software and databases installed.  Call these A, B, and C 
 machines.
 
 At any given time, your Machine A is your web front end, serving requests to 
 databases on A, B and C.  If B or C goes down, no problem - the system keeps 
 running.  If A goes down, you switch the IP address of B or C and make it 
 your web front end, and you're back in business in a few minutes.
 
 Now compare the reliability -- in order for this system to be disabled, you'd 
 have to have ALL THREE computers fail at the same time.  With the MTBF and 
 repair time of two days, each machine has a 99.726% uptime.  The MTBF, that 
 is, the expected time until all three machines will fail simultaneously, is 
 well over 100,000 years!  Of course, this is silly, machines don't last that 
 long, but it illustrates the point:  Redundancy is beats reliability (which 
 is why RAID is so useful). 
 
 All for $6,000.
 
 Craig
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Joshua D. Drake

Jeremy Haile wrote:

We are a small company looking to put together the most cost effective
solution for our production database environment.  Currently in
production Postgres 8.1 is running on this machine:

Dell 2850
2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
4 GB DDR2 400 Mhz
2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
4 x 146 GB 10K SCSI RAID 10 (for postgres data)
Perc4ei controller

The above is a standard Dell box with nothing added or modified beyond
the options available directly through Dell.

You should probably review the archives for PostgreSQL user
experience with Dell's before you purchase one.


I sent our scenario to our sales team at Dell and they came back with
all manner of SAN, DAS, and configuration costing as much as $50k.
  

HAHAHAHAHA Don't do that. Dell is making the assumption
you won't do your homework. Make sure you cross quote with IBM,
Compaq and Penguin Computing...

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Greg Stark

Joshua D. Drake [EMAIL PROTECTED] writes:

 Jeremy Haile wrote:
  We are a small company looking to put together the most cost effective
  solution for our production database environment.  Currently in
  production Postgres 8.1 is running on this machine:
 
  Dell 2850
  2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
  4 GB DDR2 400 Mhz
  2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
  4 x 146 GB 10K SCSI RAID 10 (for postgres data)
  Perc4ei controller

You don't say how this box is performing. There's no way to give
recommendations in a vacuum. Some users really need $50k boxes and others
(most) don't. This looks like a pretty good setup for Postgres and you would
have to be pushing things pretty hard to need much more.

That said some users have reported problems with Dell's raid controllers even
when the same brand's regular controllers worked well. That's what Joshua is
referring to.

  The above is a standard Dell box with nothing added or modified beyond
  the options available directly through Dell.

 You should probably review the archives for PostgreSQL user
 experience with Dell's before you purchase one.


-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Christopher Browne
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Joshua D. Drake) 
belched out:
 Jeremy Haile wrote:
 We are a small company looking to put together the most cost effective
 solution for our production database environment.  Currently in
 production Postgres 8.1 is running on this machine:

 Dell 2850
 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
 4 GB DDR2 400 Mhz
 2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
 4 x 146 GB 10K SCSI RAID 10 (for postgres data)
 Perc4ei controller

 The above is a standard Dell box with nothing added or modified beyond
 the options available directly through Dell.

 You should probably review the archives for PostgreSQL user
 experience with Dell's before you purchase one.

Hear, hear!  We found Dell servers were big-time underperformers.

Generic hardware put together with generally the same brand names of
components (e.g. - for SCSI controllers and such) would generally play
much better.

For the cheapo desktop boxes they obviously have to buy the cheapest
hardware available this week; it sure seems as though they engage in
the same sort of thing with the server class hardware.

I don't think anyone has been able to forcibly point out any
completely precise shortcoming; just that they underperform what the
specs suggest they ought to be able to provide.

 I sent our scenario to our sales team at Dell and they came back
 with all manner of SAN, DAS, and configuration costing as much as
 $50k.

 HAHAHAHAHA Don't do that. Dell is making the assumption you
 won't do your homework. Make sure you cross quote with IBM, Compaq
 and Penguin Computing...

Indeed.
-- 
(format nil [EMAIL PROTECTED] cbbrowne gmail.com)
http://linuxdatabases.info/info/rdbms.html
Rules of  the Evil  Overlord #141.  As  an alternative to  not having
children, I  will have _lots_ of  children.  My sons will  be too busy
jockeying for  position to  ever be a  real threat, and  the daughters
will all sabotage each other's attempts to win the hero.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Scott Marlowe
On Wed, 2006-02-15 at 12:44, Christopher Browne wrote:
 After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Joshua D. 
 Drake) belched out:

  You should probably review the archives for PostgreSQL user
  experience with Dell's before you purchase one.
 
 Hear, hear!  We found Dell servers were big-time underperformers.
 
 Generic hardware put together with generally the same brand names of
 components (e.g. - for SCSI controllers and such) would generally play
 much better.

My experience has been that:

A:  Their rebranded LSI and Adaptec RAID controllers underperform.
B:  Their BIOS updates for said cards and the mobos for the 26xx series
comes in a format that requires you to have a friggin bootable DOS
floppy.  What is this, 1987???
C:  They use poorly performing mobo chipsets.

We had a dual P-III-750 with a REAL LSI RAID card and an intel mobo, and
replaced it with a dual P-IV 2800 Dell 2600 with twice the RAM.  As a
database server the P-III-750 was easily a match for the new dell, and
in some ways (i/o) outran it.

We also had a dual 2400 PIV Intel generic box, and it spanked the Dell
handily at everything, was easier to work on, the parts cost less, and
it used bog standard RAID cards and such.  I would highly recommend the
Intel Generic hardware over Dell any day.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Ron

At 11:21 AM 2/15/2006, Jeremy Haile wrote:

We are a small company looking to put together the most cost effective
solution for our production database environment.  Currently in
production Postgres 8.1 is running on this machine:

Dell 2850
2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
4 GB DDR2 400 Mhz
2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
4 x 146 GB 10K SCSI RAID 10 (for postgres data)
Perc4ei controller

The above is a standard Dell box with nothing added or modified beyond
the options available directly through Dell. We had a bad processor last
week that effectively put us down for an entire weekend. Though it was
the web server that failed, the experience has caused us to step back
and spend time coming up with a more reliable/fail-safe solution that
can reduce downtime.

Our load won't be substantial so extreme performance and load balancing
are not huge concerns. We are looking for good performance, at a good
price, configured in the most redundant, high availability manner
possible. Availability is the biggest priority.

I sent our scenario to our sales team at Dell and they came back with
all manner of SAN, DAS, and configuration costing as much as $50k.

We have the budget to purchase 2-3 additional machines along the lines
of the one listed above. As a startup with a limited budget, what would
this list suggest as options for clustering/replication or setting our
database up well in general?


1= Tell Dell Thanks but no thanks. and do not buy any more 
equipment from them.  Their value per $$ is less than other options 
available to you.


2= The current best bang for the buck HW (and in many cases, best 
performing as well) for pg:
  a= AMD K8 and K9 (dual core) CPUs.  Particularly the A64 X2 3800+ 
when getting the most for your $$ matters a lot

   pg gets a nice performance boost from running in 64b.
  b= Decent Kx server boards are available from Gigabyte, IWill, 
MSI, Supermicro, and Tyan to name a few.
   IWill has a 2P 16 DIMM slot board that is particularly nice 
for a server that needs lots of RAM.
  c= Don't bother with SCSI or FC HD's unless you are doing the most 
demanding kind of OLTP.  SATA II HD's provide better value.
  d= HW RAID controllers are only worth it in certain 
scenarios.  Using RAID 5 almost always means you should use a HW RAID 
controller.
  e= The only HW RAID controllers worth the $$ for you are 3ware 
Escalade 9550SX's and Areca ARC-11xx or ARC-12xx's.
  *For the vast majority of throughput situations, the ARC-1xxx's 
with = 1GB of battery backed WB cache are the best value*
  f= 1GB RAM sticks are cheap enough and provide enough value that 
you should max out any system you get with them.
  g= for +high+ speed fail over, Chelsio and others are now making 
PCI-X and PCI-E 10GbE NICs at reasonable prices.
The above should serve as a good pick list for the components of 
any servers you need.


3= The most economically sound HW and SW architecture that best suits 
your performance and reliability needs is context dependent to your 
specific circumstances.



Where are you located?
Ron




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Josh Rovero

Jeremy Haile wrote:

Thanks for everyone's feedback.  I will definitely take the hardware
comments into consideration when purchasing future hardware.  I am
located in Atlanta, GA.  If Dell has such a bad reputation with this
list, does anyone have good vendor recommendations?   
  

I can recommend Penguin Computing (even our windows weenies like them),
ASA, and HP Proliant.  AMD Opteron *is* the way to go.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster