from:"Luke Lonergan"

Re: [PERFORM] planer chooses very bad plan

2010-04-11 Thread Luke Lonergan

Try random_page_cost=100 

- Luke

- Original Message -
From: pgsql-performance-ow...@postgresql.org 

To: pgsql-performance@postgresql.org 
Sent: Sun Apr 11 14:12:30 2010
Subject: [PERFORM] planer chooses very bad plan

Hi,

I'm having a query where the planer chooses a very bad plan.

explain analyze SELECT * FROM "telegrams" WHERE ((recipient_id=508933 
AND recipient_deleted=FALSE) OR (user_id=508933 AND user_deleted=FALSE)) 
ORDER BY id DESC LIMIT 10 OFFSET 0

"Limit  (cost=0.00..1557.67 rows=10 width=78) (actual 
time=0.096..2750.058 rows=5 loops=1)"
"  ->  Index Scan Backward using telegrams_pkey on telegrams  
(cost=0.00..156545.47 rows=1005 width=78) (actual time=0.093..2750.052 
rows=5 loops=1)"
"Filter: (((recipient_id = 508933) AND (NOT recipient_deleted)) 
OR ((user_id = 508933) AND (NOT user_deleted)))"
"Total runtime: 2750.124 ms"


When I force the planer not use do index scans, the plans looks MUCH 
better (10.000x faster):

set enable_indexscan = false;
explain analyze SELECT * FROM "telegrams" WHERE ((recipient_id=508933 
AND recipient_deleted=FALSE) OR (user_id=508933 AND user_deleted=FALSE)) 
ORDER BY id DESC LIMIT 10 OFFSET 0

"Limit  (cost=2547.16..2547.16 rows=10 width=78) (actual 
time=0.179..0.185 rows=5 loops=1)"
"  ->  Sort  (cost=2547.16..2547.41 rows=1005 width=78) (actual 
time=0.177..0.178 rows=5 loops=1)"
"Sort Key: id"
"Sort Method:  quicksort  Memory: 26kB"
"->  Bitmap Heap Scan on telegrams  (cost=17.39..2544.98 
rows=1005 width=78) (actual time=0.124..0.158 rows=5 loops=1)"
"  Recheck Cond: ((recipient_id = 508933) OR (user_id = 
508933))"
"  Filter: (((recipient_id = 508933) AND (NOT 
recipient_deleted)) OR ((user_id = 508933) AND (NOT user_deleted)))"
"  ->  BitmapOr  (cost=17.39..17.39 rows=1085 width=0) 
(actual time=0.104..0.104 rows=0 loops=1)"
"->  Bitmap Index Scan on telegrams_recipient  
(cost=0.00..8.67 rows=536 width=0) (actual time=0.033..0.033 rows=1 
loops=1)"
"  Index Cond: (recipient_id = 508933)"
"->  Bitmap Index Scan on telegrams_user  
(cost=0.00..8.67 rows=549 width=0) (actual time=0.069..0.069 rows=4 
loops=1)"
"  Index Cond: (user_id = 508933)"
"Total runtime: 0.276 ms"


The table contains several millions records and it's just be 
reindexed/analyzed.

Are there any parameters I can tune so that pgsql itself chooses the 
best plan? :)

# - Memory -
shared_buffers = 256MB
temp_buffers = 32MB
work_mem = 4MB
maintenance_work_mem = 32MB

# - Planner Cost Constants -
seq_page_cost = 1.0
random_page_cost = 2.5
cpu_tuple_cost = 0.001
cpu_index_tuple_cost = 0.0005
cpu_operator_cost = 0.00025
effective_cache_size = 20GB

# - Genetic Query Optimizer -
geqo = on

Thanks,
Corin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] I have a fusion IO drive available for testing

2009-03-26 Thread Luke Lonergan

XFS

- Luke

From: pgsql-performance-ow...@postgresql.org 

To: pgsql-performance@postgresql.org 
Sent: Thu Mar 26 05:47:55 2009
Subject: [PERFORM] I have a fusion IO drive available for testing

So far using dd I am seeing around 264MB/s on ext3, 335MB/s on ext2 write 
speed. So the question becomes what is the best filesystem for this drive?

Anyone want me to run anything on it ?

Dave

Re: [PERFORM] SSD performance

2009-01-23 Thread Luke Lonergan

Hmm - I wonder what OS it runs ;-)

- Luke

- Original Message -
From: da...@lang.hm 
To: Luke Lonergan
Cc: glynast...@yahoo.co.uk ; 
pgsql-performance@postgresql.org 
Sent: Fri Jan 23 04:52:27 2009
Subject: Re: [PERFORM] SSD performance

On Fri, 23 Jan 2009, Luke Lonergan wrote:

> Why not simply plug your server into a UPS and get 10-20x the
> performance using the same approach (with OS IO cache)?
>
> In fact, with the server it's more robust, as you don't have to transit
> several intervening physical devices to get to the RAM.
>
> If you want a file interface, declare a RAMDISK.
>
> Cheaper/faster/improved reliability.

you can also disable fsync to not wait for your disks if you trust your
system to never go down. personally I don't trust any system to not go
down.

if you have a system crash or reboot your RAMDISK will loose it's content,
this device won't.

also you are limited to how many DIMMS you can put on your motherboard
(for the dual-socket systems I am buying nowdays, I'm limited to 32G of
ram) going to a different motherboard that can support additional ram can
be quite expensive.

this isn't for everyone, but for people who need the performance, data
reliability, this looks like a very interesting option.

David Lang

> - Luke
>
> - Original Message -
> From: pgsql-performance-ow...@postgresql.org 
> 
> To: Glyn Astill 
> Cc: pgsql-performance@postgresql.org 
> Sent: Fri Jan 23 04:39:07 2009
> Subject: Re: [PERFORM] SSD performance
>
> On Fri, 23 Jan 2009, Glyn Astill wrote:
>
>>> I spotted a new interesting SSD review. it's a $379
>>> 5.25" drive bay device that holds up to 8 DDR2 DIMMS
>>> (up to 8G per DIMM) and appears to the system as a SATA
>>> drive (or a pair of SATA drives that you can RAID-0 to get
>>> past the 300MB/s SATA bottleneck)
>>>
>>
>> Sounds very similar to the Gigabyte iRam drives of a few years ago
>>
>> http://en.wikipedia.org/wiki/I-RAM
>
> similar concept, but there are some significant differences
>
> the iRam was limited to 4G, used DDR ram, and used a PCI slot for power
> (which can be in
> short supply nowdays)
>
> this new drive can go to 64G, uses DDR2 ram (cheaper than DDR nowdays),
> gets powered like a normal SATA drive, can use two SATA channels (to be
> able to get past the throughput limits of a single SATA interface), and
> has a CF card slot to backup the data to if the system powers down.
>
> plus the performance appears to be significantly better (even without
> using the second SATA interface)
>
> David Lang
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSD performance

2009-01-23 Thread Luke Lonergan

Why not simply plug your server into a UPS and get 10-20x the performance using 
the same approach (with OS IO cache)?

In fact, with the server it's more robust, as you don't have to transit several 
intervening physical devices to get to the RAM.

If you want a file interface, declare a RAMDISK.

Cheaper/faster/improved reliability.

- Luke

- Original Message -
From: pgsql-performance-ow...@postgresql.org 

To: Glyn Astill 
Cc: pgsql-performance@postgresql.org 
Sent: Fri Jan 23 04:39:07 2009
Subject: Re: [PERFORM] SSD performance

On Fri, 23 Jan 2009, Glyn Astill wrote:

>> I spotted a new interesting SSD review. it's a $379
>> 5.25" drive bay device that holds up to 8 DDR2 DIMMS
>> (up to 8G per DIMM) and appears to the system as a SATA
>> drive (or a pair of SATA drives that you can RAID-0 to get
>> past the 300MB/s SATA bottleneck)
>>
>
> Sounds very similar to the Gigabyte iRam drives of a few years ago
>
> http://en.wikipedia.org/wiki/I-RAM

similar concept, but there are some significant differences

the iRam was limited to 4G, used DDR ram, and used a PCI slot for power
(which can be in
short supply nowdays)

this new drive can go to 64G, uses DDR2 ram (cheaper than DDR nowdays),
gets powered like a normal SATA drive, can use two SATA channels (to be
able to get past the throughput limits of a single SATA interface), and
has a CF card slot to backup the data to if the system powers down.

plus the performance appears to be significantly better (even without
using the second SATA interface)

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-11 Thread Luke Lonergan

Not to mention the #1 cause of server faults in my experience: OS kernel bug 
causes a crash.  Battery backup doesn't help you much there.

Fsync of log is necessary IMO.

That said, you could use a replication/backup strategy to get a consistent 
snapshot in the past if you don't mind losing some data or can recreate it from 
backup elsewhere.

I think a strategy that puts the WAL on an SLC SSD is a very good one as of 
Jan/09 and will get much better in short order.

- Luke

- Original Message -
From: pgsql-performance-ow...@postgresql.org 

To: Glyn Astill 
Cc: Ron ; Scott Marlowe ; 
pgsql-performance@postgresql.org 
Sent: Sun Jan 11 15:35:22 2009
Subject: Re: [PERFORM] understanding postgres issues/bottlenecks

On Sun, 11 Jan 2009, Glyn Astill wrote:

> --- On Sun, 11/1/09, Scott Marlowe  wrote:
>
>> They also told me we could never lose power in the hosting
>> center
>> because it was so wonder and redundant and that I was
>> wasting my time.
>
> We'll that's just plain silly, at the very least there's always going to
> be some breakers / fuzes in between the power and the machines.
>
> In fact in our building there's quite a few breaker points between our
> comms room on the 3rd floor and the ups / generator in the basement.
> It's a crappy implementation actually.

the response I get from people is that they give their servers redundant
poewr supplies and put them on seperate circuits so they must be safe from
that.

but as commented, that's not enough in the real world.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-10 Thread Luke Lonergan

I believe they write at 200MB/s which is outstanding for sequential BW.  Not 
sure about the write latency, though the Anandtech benchmark results showed 
high detail and IIRC the write latencies were very good.

- Luke

- Original Message -
From: da...@lang.hm 
To: Luke Lonergan
Cc: st...@enterprisedb.com ; mar...@bluegap.ch 
; scott.marl...@gmail.com ; 
rjpe...@earthlink.net ; pgsql-performance@postgresql.org 

Sent: Sat Jan 10 16:03:32 2009
Subject: Re: [PERFORM] understanding postgres issues/bottlenecks

On Sat, 10 Jan 2009, Luke Lonergan wrote:

> The new MLC based SSDs have better wear leveling tech and don't suffer
> the pauses.  Intel X25-M 80 and 160 GB SSDs are both pause-free.  See
> Anandtech's test results for details.

they don't suffer the pauses, but they still don't have fantasic write
speeds.

David Lang

> Intel's SLC SSDs should also be good enough but they're smaller.
>
> - Luke
>
> - Original Message -
> From: pgsql-performance-ow...@postgresql.org 
> 
> To: Gregory Stark 
> Cc: Markus Wanner ; Scott Marlowe 
> ; Ron ; 
> pgsql-performance@postgresql.org 
> Sent: Sat Jan 10 14:40:51 2009
> Subject: Re: [PERFORM] understanding postgres issues/bottlenecks
>
> On Sat, 10 Jan 2009, Gregory Stark wrote:
>
>> da...@lang.hm writes:
>>
>>> On Sat, 10 Jan 2009, Markus Wanner wrote:
>>>
>>>> My understanding of SSDs so far is, that they are not that bad at
>>>> writing *on average*, but to perform wear-leveling, they sometimes have
>>>> to shuffle around multiple blocks at once. So there are pretty awful
>>>> spikes for writing latency (IIRC more than 100ms has been measured on
>>>> cheaper disks).
>>
>> That would be fascinating. And frightening. A lot of people have been
>> recommending these for WAL disks and this would be make them actually *worse*
>> than regular drives.
>>
>>> well, I have one of those cheap disks.
>>>
>>> brand new out of the box, format the 32G drive, then copy large files to it
>>> (~1G per file). this should do almost no wear-leveling, but it's write
>>> performance is still poor and it has occasional 1 second pauses.
>>
>> This isn't similar to the way WAL behaves though. What you're testing is the
>> behaviour when the bandwidth to the SSD is saturated. At that point some 
>> point
>> in the stack, whether in the SSD, the USB hardware or driver, or OS buffer
>> cache can start to queue up writes. The stalls you see could be the behaviour
>> when that queue fills up and it needs to push back to higher layers.
>>
>> To simulate WAL you want to transfer smaller volumes of data, well below the
>> bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time
>> each fsync and see whether the time they take is proportional to the amount 
>> of
>> data written in the meantime or whether they randomly spike upwards.
>
> if you have a specific benchmark for me to test I would be happy to do
> this.
>
> the test that I did is basicly the best-case for the SSD (more-or-less
> sequential writes where the vendors claim that the drives match or
> slightly outperform the traditional disks). for random writes the vendors
> put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives.
>
> take a look at this paper
> http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf
>
> this is not one of the low-performance drives, they include a sandisk
> drive in the paper that shows significantly less performance (but the same
> basic pattern) than the imation drives.
>
> David Lang
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-10 Thread Luke Lonergan

The new MLC based SSDs have better wear leveling tech and don't suffer the 
pauses.  Intel X25-M 80 and 160 GB SSDs are both pause-free.  See Anandtech's 
test results for details.

Intel's SLC SSDs should also be good enough but they're smaller.

- Luke

- Original Message -
From: pgsql-performance-ow...@postgresql.org 

To: Gregory Stark 
Cc: Markus Wanner ; Scott Marlowe ; 
Ron ; pgsql-performance@postgresql.org 

Sent: Sat Jan 10 14:40:51 2009
Subject: Re: [PERFORM] understanding postgres issues/bottlenecks

On Sat, 10 Jan 2009, Gregory Stark wrote:

> da...@lang.hm writes:
>
>> On Sat, 10 Jan 2009, Markus Wanner wrote:
>>
>>> My understanding of SSDs so far is, that they are not that bad at
>>> writing *on average*, but to perform wear-leveling, they sometimes have
>>> to shuffle around multiple blocks at once. So there are pretty awful
>>> spikes for writing latency (IIRC more than 100ms has been measured on
>>> cheaper disks).
>
> That would be fascinating. And frightening. A lot of people have been
> recommending these for WAL disks and this would be make them actually *worse*
> than regular drives.
>
>> well, I have one of those cheap disks.
>>
>> brand new out of the box, format the 32G drive, then copy large files to it
>> (~1G per file). this should do almost no wear-leveling, but it's write
>> performance is still poor and it has occasional 1 second pauses.
>
> This isn't similar to the way WAL behaves though. What you're testing is the
> behaviour when the bandwidth to the SSD is saturated. At that point some point
> in the stack, whether in the SSD, the USB hardware or driver, or OS buffer
> cache can start to queue up writes. The stalls you see could be the behaviour
> when that queue fills up and it needs to push back to higher layers.
>
> To simulate WAL you want to transfer smaller volumes of data, well below the
> bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time
> each fsync and see whether the time they take is proportional to the amount of
> data written in the meantime or whether they randomly spike upwards.

if you have a specific benchmark for me to test I would be happy to do
this.

the test that I did is basicly the best-case for the SSD (more-or-less
sequential writes where the vendors claim that the drives match or
slightly outperform the traditional disks). for random writes the vendors
put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives.

take a look at this paper
http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf

this is not one of the low-performance drives, they include a sandisk
drive in the paper that shows significantly less performance (but the same
basic pattern) than the imation drives.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Luke Lonergan

Your expected write speed on a 4 drive RAID10 is two drives worth, probably 160 
MB/s, depending on the generation of drives.

The expect write speed for a 6 drive RAID5 is 5 drives worth, or about 400 
MB/s, sans the RAID5 parity overhead.

- Luke

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org 
Sent: Fri Aug 08 10:23:55 2008
Subject: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

Hello list,

I have a server with a direct attached storage containing 4 15k SAS  
drives and 6 standard SATA drives.
The server is a quad core xeon with 16GB ram.
Both server and DAS has dual PERC/6E raid controllers with 512 MB BBU

There is 2 raid set configured.
One RAID 10 containing 4 SAS disks
One RAID 5 containing 6 SATA disks

There is one partition per RAID set with ext2 filesystem.

I ran the following iozone test which I stole from Joshua Drake's test  
at
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

I ran this test against the RAID 5 SATA partition

#iozone -e -i0 -i1 -i2 -i8 -t1 -s 1000m -r 8k -+u

With these random write results

Children see throughput for 1 random writers=  168647.33 KB/sec
Parent sees throughput for 1 random writers =  168413.61 KB/sec
Min throughput per process  =  168647.33 KB/sec
Max throughput per process  =  168647.33 KB/sec
Avg throughput per process  =  168647.33 KB/sec
Min xfer= 1024000.00 KB
CPU utilization: Wall time6.072CPU time0.540CPU  
utilization   8.89 %

Almost 170 MB/sek. Not bad for 6 standard SATA drives.

Then I ran the same thing against the RAID 10 SAS partition

Children see throughput for 1 random writers=   68816.25 KB/sec
Parent sees throughput for 1 random writers =   68767.90 KB/sec
Min throughput per process  =   68816.25 KB/sec
Max throughput per process  =   68816.25 KB/sec
Avg throughput per process  =   68816.25 KB/sec
Min xfer= 1024000.00 KB
CPU utilization: Wall time   14.880CPU time0.520CPU  
utilization   3.49 %

What only 70 MB/sek?

Is it possible that the 2 more spindles for the SATA drives makes that  
partition s much faster? Even though the disks and the RAID  
configuration should be slower?
It feels like there is something fishy going on. Maybe the RAID 10  
implementation on the PERC/6e is crap?

Any pointers, suggestion, ideas?

I'm going to change the RAID 10 to a RAID 5 and test again and see  
what happens.

Cheers,
Henke


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Luke Lonergan

Hi Stephane,

On 7/21/08 1:53 AM, "Stephane Bailliez" <[EMAIL PROTECTED]> wrote:

>> I'd suggest RAID5, or even better, configure all eight disks as a JBOD
>> in the RAID adapter and run ZFS RAIDZ.  You would then expect to get
>> about 7 x 80 = 560 MB/s on your single query.
>> 
> Do you have a particular controller and disk hardware configuration in
> mind when you're suggesting RAID5 ?
> My understanding was it was more difficult to find the right hardware to
> get performance on RAID5 compared to RAID10.

If you're running RAIDZ on ZFS, the controller you have should be fine.
Just configure the HW RAID controller to treat the disks as JBOD (eight
individual disks), then make a single RAIDZ zpool of the eight disks.  This
will run them in a robust SW RAID within Solaris.  The fault management is
superior to what you would otherwise have in your HW RAID and the
performance should be much better.

- Luke

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-19 Thread Luke Lonergan

pgbench is unrelated to the workload you are concerned with if ETL/ELT and 
decision support / data warehousing queries are your target. 

Also - placing the xlog on dedicated disks is mostly irrelevant to data 
warehouse / decision support work or ELT.  If you need to maximize loading 
speed while concurrent queries are running, it may be necessary, but I think 
you'll be limited in load speed by CPU related to data formatting anyway.

The primary performance driver for ELT / DW is sequential transfer rate, thus 
the dd test at 2X memory.  With six data disks of this type, you should expect 
a maximum of around 6 x 80 = 480 MB/s.  With RAID10, depending on the raid 
adapter, you may need to have two or more IO streams to use all platters, 
otherwise your max speed for one query would be 1/2 that, or 240 MB/s.

I'd suggest RAID5, or even better, configure all eight disks as a JBOD in the 
RAID adapter and run ZFS RAIDZ.  You would then expect to get about 7 x 80 = 
560 MB/s on your single query.

That said, your single cpu on one query will only be able to scan that data at 
about 300 MB/s (try running a SELECT COUNT(*) against a table that is 2X memory 
size).

- Luke

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org 
Sent: Sat Jul 19 09:19:43 2008
Subject: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)


I'm trying to run a few basic tests to see what a current machine can 
deliver (typical workload ETL like, long running aggregate queries, 
medium size db ~100 to 200GB).

I'm currently checking the system (dd, bonnie++) to see if performances 
are within the normal range but I'm having trouble relating it to 
anything known. Scouting the archives there are more than a few people 
familiar with it, so if someone can have a look at those numbers and 
raise a flag where some numbers look very out of range for such system, 
that would be appreciated. I also added some raw pgbench numbers at the end.

(Many thanks to Greg Smith, his pages was extremely helpful to get 
started. Any mistake is mine)

Hardware:

Sun Fire X4150 x64

2 Quad-Core Intel(R) Xeon(R) X5460 processor (2x6MB L2, 3.16 GHz, 1333 
MHz FSB)
16GB of memory (4x2GB PC2-5300 667 MHz ECC fully buffered DDR2 DIMMs)

6x 146GB 10K RPM SAS  in RAID10 - for os + data
2x 146GB 10K RPM SAS  in RAID1 - for xlog
Sun StorageTek SAS HBA Internal (Adaptec AAC-RAID)


OS is Ubuntu 7.10 x86_64 running  2.6.22-14
os in on ext3
data is on xfs noatime
xlog is on ext2 noatime


data
$ time sh -c "dd if=/dev/zero of=bigfile bs=8k count=400 && sync"
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 152.359 seconds, 215 MB/s

real2m36.895s
user0m0.570s
sys 0m36.520s

$ time dd if=bigfile of=/dev/null bs=8k
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 114.723 seconds, 286 MB/s

real1m54.725s
user0m0.450s
sys 0m22.060s


xlog
$ time sh -c "dd if=/dev/zero of=bigfile bs=8k count=400 && sync"
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 389.216 seconds, 84.2 MB/s

real6m50.155s
user0m0.420s
sys 0m26.490s

$ time dd if=bigfile of=/dev/null bs=8k
400+0 records in
400+0 records out
3276800 bytes (33 GB) copied, 294.556 seconds, 111 MB/s

real4m54.558s
user0m0.430s
sys 0m23.480s



bonnie++ -s 32g -n 256

data:
Version  1.03   --Sequential Output-- --Sequential Input- 
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
lid-statsdb-1   32G 101188  98 202523  20 107642  13 88931  88 271576  
19 980.7   2
--Sequential Create-- Random 
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
256 11429  93 + +++ 17492  71 11097  91 + +++  
2473  11



xlog
Version  1.03   --Sequential Output-- --Sequential Input- 
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
lid-statsdb-1   32G 62973  59 69981   5 35433   4 87977  85 119749   9 
496.2   1
--Sequential Create-- Random 
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
256   551  99 + +++ 300935  99   573  99 + +++  
1384  99

pgbench

postgresql 8.2.9 with data and xlog as mentioned above

postgresql.conf:
shared_buffers = 4GB
checkpoint_segments = 8
effective_cache_size = 8GB

Script running over scaling factor 1 to 1000 and running 3 times pgbench 
with "pgbench -t

Re: [PERFORM] 3ware vs Areca

2008-07-11 Thread Luke Lonergan

The Arecas are a lot faster than the 9550, more noticeable with disk counts
from 12 on up.  At 8 disks you may not see much difference.

The 3Ware 9650 is their answer to the Areca and it put the two a lot closer.

FWIW  we got some Arecas at one point and had trouble getting them
configured and working properly.

- Luke

On 7/11/08 6:26 AM, "Jeff" <[EMAIL PROTECTED]> wrote:

> I've got a couple boxes with some 3ware 9550 controllers, and I'm less than
> pleased with performance on them.. Sequential access is nice, but start
> seeking around and you kick it in the gut.  (I've found posts on the internets
> about others having similar issues).  My last box with a 3ware I simply had it
> in jbod mode and used sw raid and it smoked the hw.
> 
> Anyway, anybody have experience in 3ware vs Areca - I've heard plenty of good
> anecdotal things that Areca is much better, just wondering if anybody here has
> firsthand experience.    It'll be plugged into about 8 10k rpm sata disks. 
> 
> thanks
>  
> --
> Jeff Trout <[EMAIL PROTECTED]>
> http://www.stuarthamm.net/
> http://www.dellsmartexitin.com/
> 
> 
>  
> 
>

Re: [PERFORM] I/O on select count(*)

2008-05-22 Thread Luke Lonergan

Hi Hannu,

Interesting suggestion on the partial index!

I'll find out if we can extract our code that did the work.  It was simple but 
scattered in a few routines.

In concept it worked like this:

1 - Ignore if hint bits are unset, use them if set.  This affects heapam and 
vacuum I think.
2 - implement a cache for clog lookups based on the optimistic assumption that 
the data was inserted in bulk.  Put the cache one call away from heapgetnext()

I forget the details of (2).  As I recall, if we fall off of the assumption, 
the penalty for long scans get large-ish (maybe 2X), but since when do people 
full table scan when they're updates/inserts are so scattered across TIDs?  
It's an obvious big win for DW work.

We also have a GUC to turn it off if needed, in which case a vacuum will write 
the hint bits.

- Luke

- Original Message -
From: Hannu Krosing <[EMAIL PROTECTED]>
To: Luke Lonergan
Cc: Pavan Deolasee <[EMAIL PROTECTED]>; Greg Smith <[EMAIL PROTECTED]>; Alvaro 
Herrera <[EMAIL PROTECTED]>; pgsql-performance@postgresql.org 

Sent: Thu May 22 12:10:02 2008
Subject: Re: [PERFORM] I/O on select count(*)

On Thu, 2008-05-15 at 10:52 +0800, Luke Lonergan wrote:
> BTW – we’ve removed HINT bit checking in Greenplum DB and improved the
> visibility caching which was enough to provide performance at the same
> level as with the HINT bit optimization, but avoids this whole “write
> the data, write it to the log also, then write it again just for good
> measure” behavior.
> 
> For people doing data warehousing work like the poster, this Postgres
> behavior is miserable.  It should be fixed for 8.4 for sure
> (volunteers?)

I might try it. I think I have told you about my ideas ;)
I plan to first do "cacheing" (for being able to doi index only scans
among other things) and then if the cache works reliably, use the
"cacheing" code as the main visibility / MVCC mechanism.

Is Greenplums code available, or should I roll my own ?

> BTW – for the poster’s benefit, you should implement partitioning by
> date, then load each partition and VACUUM ANALYZE after each load.
>  You probably won’t need the date index anymore – so your load times
> will vastly improve (no indexes), you’ll store less data (no indexes)
> and you’ll be able to do simpler data management with the partitions.
> 
> You may also want to partition AND index if you do a lot of short
> range selective date predicates.  Example would be: partition by day,
> index on date field, queries selective on date ranges by hour will
> then select out only the day needed, then index scan to get the 
> hourly values.

If your queries allow it, you may try indexing on 
int2::extract('HOUR' from date)
so the index may be smaller

storing the date as type abstime is another way to reduce index size.

> Typically time-oriented data is nearly time sorted anyway, so you’ll
> also get the benefit of a clustered index.

Hannu

Re: [PERFORM] Posible planner improvement?

2008-05-21 Thread Luke Lonergan

The problem is that the implied join predicate is not being propagated.  This 
is definitely a planner deficiency.

- Luke

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org 
Sent: Wed May 21 07:37:49 2008
Subject: Re: [PERFORM] Posible planner improvement?

A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure:
> A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> > A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> > >> Albert Cervera Areny wrote:
> > >>> I've got a query similar to this:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> > >>>
> > >>> That took > 84 minutes (the query was a bit longer but this is the
> > >>> part that made the difference) after a little change the query took
> > >>> ~1 second:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
> > >>> t1.id = t2.id;
> > >>
> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> > >> we'll see why it's better at the second one.
> >
> > Even if the estimates were off (they look a bit off for the first
> > table), the above two queries are logically identical, and I would
> > expect the planner to make the same decision for both.
> >
> > I am curious - what is the result of:
> >
> > select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
> >
> > Is it the same speed as the first or second, or is a third speed
> > entirely?
>
> Attached the same file with the third result at the end. The result is
> worst than the other two cases. Note that I've analyzed both tables but
> results are the same. One order of magnitude between the two first queries.

Sorry, it's not worse than the other two cases as shown in the file. However, 
after repetition it seems the other two seem to decrease more than the third 
one whose times vary a bit more and some times take up to 5 seconds.

Other queries are running in the same machine, so take times with a grain of 
salt. What's clear is that always  there's a big difference between first and 
second queries.

>
> > If t1.id = t2.id, I would expect the planner to substitute them freely
> > in terms of identities?
> >
> > Cheers,
> > mark

-- 
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individualsender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internete-mail,pleaseinform usinmmediately.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of disk space

2008-05-16 Thread Luke Lonergan

Try 'set enable-mergejoin=false' and see if you get a hashjoin.

- Luke

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: Richard Huxton <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org 
Sent: Fri May 16 04:00:41 2008
Subject: Re: [PERFORM] Join runs for > 10 hours and then fills up >1.3TB of 
disk space

I'm expecting 9,961,914 rows returned. Each row in the big table  
should have a corresponding key in the smaller tale, I want to  
basically "expand" the big table column list by one, via adding the  
appropriate key from the smaller table for each row in the big table.  
It's not a cartesion product join.



On May 16, 2008, at 1:40 AM, Richard Huxton wrote:

> kevin kempter wrote:
>> Hi List;
>> I have a table with 9,961,914 rows in it (see the describe of  
>> bigtab_stats_fact_tmp14 below)
>> I also have a table with 7,785 rows in it (see the describe of  
>> xsegment_dim below)
>> I'm running the join shown below and it takes > 10 hours and  
>> eventually runs out of disk space on a 1.4TB file system
>
>> QUERY PLAN
>> ---
>>  Merge 
>>  Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
>
> Dumb question Kevin, but are you really expecting 3.2 billion rows  
> in the result-set? Because that's approaching 400GB of result-set  
> without any overheads.
>
> -- 
>  Richard Huxton
>  Archonet Ltd


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] I/O on select count(*)

2008-05-14 Thread Luke Lonergan

BTW  we¹ve removed HINT bit checking in Greenplum DB and improved the
visibility caching which was enough to provide performance at the same level
as with the HINT bit optimization, but avoids this whole ³write the data,
write it to the log also, then write it again just for good measure²
behavior.

For people doing data warehousing work like the poster, this Postgres
behavior is miserable.  It should be fixed for 8.4 for sure (volunteers?)

BTW  for the poster¹s benefit, you should implement partitioning by date,
then load each partition and VACUUM ANALYZE after each load.  You probably
won¹t need the date index anymore  so your load times will vastly improve
(no indexes), you¹ll store less data (no indexes) and you¹ll be able to do
simpler data management with the partitions.

You may also want to partition AND index if you do a lot of short range
selective date predicates.  Example would be: partition by day, index on
date field, queries selective on date ranges by hour will then select out
only the day needed, then index scan to get the hourly values.  Typically
time-oriented data is nearly time sorted anyway, so you¹ll also get the
benefit of a clustered index.

- Luke

On 5/15/08 10:40 AM, "Pavan Deolasee" <[EMAIL PROTECTED]> wrote:

> On Thu, May 15, 2008 at 7:51 AM, Greg Smith <[EMAIL PROTECTED]> wrote:
>> >
>> >
>> > So is vacuum helpful here because it will force all that to happen in one
>> > batch?  To put that another way:  if I've run a manual vacuum, is it true
>> > that it will have updated all the hint bits to XMIN_COMMITTED for all the
>> > tuples that were all done when the vacuum started?
>> >
> 
> Yes. For that matter, even a plain SELECT or count(*) on the entire
> table is good enough. That will check every tuple for visibility and
> set it's hint bits.
> 
> Another point to note is that the hint bits are checked and set on a
> per tuple basis. So especially during index scan, the same heap page
> may get rewritten many times. I had suggested in the past that
> whenever we set hint bits for a tuple, we should check all other
> tuples in the page and set their hint bits too to avoid multiple
> writes of the same page. I guess the idea got rejected because of lack
> of benchmarks to prove the benefit.
> 
> Thanks,
> Pavan
> 
> --
> Pavan Deolasee
> EnterpriseDB http://www.enterprisedb.com
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-26 Thread Luke Lonergan

Yep  just do something like this within sqlplus (from
http://www.dbforums.com/showthread.php?t=350614):

set termout off
set hea off
set pagesize 0

spool c:\whatever.csv

select a.a||','||a.b||','||a.c
from a
where a.a="whatever";

spool off

COPY is the fastest approach to get it into PG.

- Luke

On 4/26/08 6:25 AM, "Adonias Malosso" <[EMAIL PROTECTED]> wrote:

> Hi All,
> 
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table 
> from ORACLE to PGSQL.
> 
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
> 
> Anyone has a better idea.
> 
> 
> Regards
> Adonias Malosso
>

Re: [PERFORM] Group by more efficient than distinct?

2008-04-20 Thread Luke Lonergan

Hi Francisco,

Generally, PG sorting is much slower than hash aggregation for performing
the distinct operation.  There may be small sizes where this isn¹t true, but
for large amounts of data (in-memory or not), hash agg (used most often, but
not always by GROUP BY) is faster.

We¹ve implemented a special optimization to PG sorting that does the
distinct processing within the sort, instead of afterward, but it¹s limited
to some small-ish number (10,000) of distinct values due to it¹s use of a
memory and processing intensive heap.

So, you¹re better off using GROUP BY and making sure that the planner is
using hash agg to do the work.

- Luke 

On 4/17/08 8:46 PM, "Francisco Reyes" <[EMAIL PROTECTED]> wrote:

> I am trying to get a distinct set of rows from 2 tables.
> After looking at someone else's query I noticed they were doing a group by
> to obtain the unique list.
> 
> After comparing on multiple machines with several tables, it seems using
> group by to obtain a distinct list is substantially faster than using
> select distinct.
> 
> Is there any dissadvantage of using "group by" to obtain a unique list?
> 
> On a small dataset the difference was about 20% percent.
> 
> Group by
>  HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
> time=76.641..85.167 rows=2890 loops=1)
> 
> Distinct
>  Unique  (cost=1088.23..1174.53 rows=1151 width=8) (actual
> time=90.516..140.123 rows=2890 loops=1)
> 
> Although I don't have the numbers here with me, a simmilar result was
> obtaining against a query that would return 100,000 rows. 20% and more
> speed differnce between "group by" over "select distinct".
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] "Slow" query or just "Bad hardware"?

2008-03-27 Thread Luke Lonergan

You might try turning ³enable_bitmapscan² off, that will avoid the full
index scan and creation of the bitmap.

- Luke


On 3/27/08 8:34 AM, "Jesper Krogh" <[EMAIL PROTECTED]> wrote:

> Hi
> 
> I have a table with around 10 million entries  The webpage rendered hits
> at most 200 records which are distributed well in the 10m with an average
> of 2 "references" pr. entry.
> 
> Is there anyway to speed this query more up than allready. .. yes running
> it subsequenctly it is blazingly fast, but with view of around 200/10m we
> most
> often dont hit the same query again.
> 
> 
> # explain analyze SELECT "me"."created", "me"."created_initials",
> "me"."updated", "me"."updated_initials", "me"."start_time",
> "me"."end_time", "me"."notes", "me"."id", "me"."sequence_id",
> "me"."database", "me"."name", "numbers"."reference_id",
> "numbers"."evidence" FROM "reference" "me" LEFT JOIN "number" "numbers" ON
> ( "numbers"."reference_id" = "me"."id" ) WHERE ( "me"."sequence_id" IN (
> 34284, 41503, 42274, 42285, 76847, 78204, 104721, 126279, 274770, 274790,
> 274809, 305346, 307383, 307411, 309691, 311362, 344930, 352530, 371033,
> 371058, 507790, 517521, 517537, 517546, 526883, 558976, 4894317, 4976383,
> 1676203, 4700800, 688803, 5028679, 5028694, 5028696, 5028684, 5028698,
> 5028701, 5028676, 5028682, 5028686, 5028692, 5028689, 3048683, 5305427,
> 5305426, 4970187, 4970216, 4970181, 4970208, 4970196, 4970226, 4970232,
> 4970201, 4970191, 4970222, 4350307, 4873618, 1806537, 1817367, 1817432,
> 4684270, 4981822, 3172776, 4894299, 4894304, 4700798, 1120990, 4981817,
> 4831109, 4831036, 4831068, 4831057, 4831105, 4831038, 4831044, 4831081,
> 4831063, 4831051, 4831086, 4831049, 4831071, 4831075, 4831114, 4831093,
> 2635142, 4660208, 4660199, 4912338, 4660150, 4662011, 5307782, 4894286,
> 4894292, 4894296, 4894309, 4894313, 1428388, 1932290, 5306082, 2010148,
> 3979647, 4382006, 4220374, 1880794, 1526588, 774838, 1377100, 969316,
> 1796618, 1121046, 4662009, 963535, 5302610, 1121105, 688700, 688743,
> 688836, 688763, 688788, 1056859, 2386006, 2386015, 2386023, 4265832,
> 4231262, 4265743, 5302612, 1121056, 1121090, 1121074, 688659, 688650 ) )
> ORDER BY "ecnumbers"."reference_id";
>  
> QUERY
> PLAN
> --
> --
> --
> --
> --
> --
> --
> --
> --
> --
> --
> --
> --
>  
> --
> ---
>  Sort  (cost=56246.18..56275.20 rows=11606 width=299) (actual
> time=2286.900..2287.215 rows=389 loops=1)
>Sort Key: numbers.reference_id
>->  Nested Loop Left Join  (cost=388.48..55462.63 rows=11606 width=299)
> (actual time=475.071..2284.502 rows=389 loops=1)
>  ->  Bitmap Heap Scan on reference me  (cost=388.48..23515.97
> rows=11606 width=191) (actual time=451.245..1583.966 rows=389
> loops=1)
>Recheck Cond: (sequence_id = ANY
> ('{34284,41503,42274,42285,76847,78204,104721,126279,274770,274790,274809,3053
> 46,307383,307411,309691,311362,344930,352530,371033,371058,507790,517521,51753
> 7,517546,526883,558976,4894317,4976383,1676203,4700800,688803,5028679,5028694,
> 5028696,5028684,5028698,5028701,5028676,5028682,5028686,5028692,5028689,304868
> 3,5305427,5305426,4970187,4970216,4970181,4970208,4970196,4970226,4970232,4970
> 201,4970191,4970222,4350307,4873618,1806537,1817367,1817432,4684270,4981822,31
> 72776,4894299,4894304,4700798,1120990,4981817,4831109,4831036,4831068,4831057,
> 4831105,4831038,4831044,4831081,4831063,4831051,4831086,4831049,4831071,483107
> 5,4831114,4831093,2635142,4660208,4660199,4912338,4660150,4662011,5307782,4894
> 286,4894292,4894296,4894309,4894313,1428388,1932290,5306082,2010148,3979647,43
> 82006,4220374,1880794,1526588,774838,1377100,969316,1796618,1121046,4662009,96
> 3535,5302610,1121105,688700,688743,688836,688763,688788,1056859,2386006,238601
> 5,2386023,4265832,4231262,4265743,5302612,1121056,1121
>  090,1121074,688659,688650}'::integer[]))
>->  Bitmap Index Scan on reference_seq_idx
> (cost=0.00..385.58 rows=

Re: [PERFORM] postgresql is slow with larger table even it is in RAM

2008-03-27 Thread Luke Lonergan

So your table is about 80 MB in size, or perhaps 120 MB if it fits in
shared_buffers.  You can check it using ³SELECT
pg_size_pretty(pg_relation_size(mytable¹))²

- Luke  

On 3/26/08 4:48 PM, "Peter Koczan" <[EMAIL PROTECTED]> wrote:

> FWIW, I did a select count(*) on a table with just over 30 rows,
> and it only took 0.28 sec.

Re: [PERFORM] postgresql is slow with larger table even it is in RAM

2008-03-25 Thread Luke Lonergan

Hello Sathiya,

1st:  you should not use a ramdisk for this, it will slow things down as
compared to simply having the table on disk.  Scanning it the first time
when on disk will load it into the OS IO cache, after which you will get
memory speed.

2nd: you should expect the ³SELECT COUNT(*)² to run at a maximum of about
350  600 MB/s (depending on PG version and CPU speed).  It is CPU speed
limited to that rate of counting rows no matter how fast your IO is.

So, for your 700 MB table, you should expect a COUNT(*) to run in about 1-2
seconds best case.  This will approximate the speed at which other queries
can run against the table.

- Luke

On 3/25/08 1:35 AM, "sathiya psql" <[EMAIL PROTECTED]> wrote:

> Dear Friends,
>  I have a table with 32 lakh record in it. Table size is nearly 700 MB,
> and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM,
> and then created this table in this RAM.
> 
> So now everything is in RAM, if i do a count(*) on this table it returns
> 327600 in 3 seconds, why it is taking 3 seconds ? because am sure that no
> Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is
> happening, swap is also not used )
> 
> Any Idea on this ???
> 
> I searched a lot in newsgroups ... can't find relevant things ( because
> everywhere they are speaking about disk access speed, here i don't want to
> worry about disk access )
> 
> If required i will give more information on this.
> 
> 
>

Re: [PERFORM] How to choose a disc array for Postgresql?

2008-03-02 Thread Luke Lonergan

The Dell MD1000 is good.  The most trouble you will have will be with the raid 
adapter - to get the best support I suggest trying to buy the dell perc 5e 
(also an LSI) - that way you'll get drivers that work and are supported.

Latest seq scan performance I've seen on redhat 5 is 400 MB/s on eight drives 
in RAID10 after setting linux max readahead to 16384 (blockdev --setra 16384) 
and 220 without.

- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org 
Sent: Sun Mar 02 02:37:37 2008
Subject: [PERFORM] How to choose a disc array for Postgresql?

I am moving our small business application
database application supporting 
a 24/7 animal hospital to use 8.0.15 from
7.4.19 (it will not support 8.1, 8.2. or 8.3).

Now, we can choose a new a disc array. SATA
seems cheaper and you can get more discs but
I want to stay with SCSI.  Any good reasons to
choose SATA over SCSI?

I need to consider a vendor for the new disc array (6-
to 8 discs). The local vendor (in the San Francisco Bay Area),
I've not been completely pleased with, so I am considering using
Dell storage connecting to an retail version LSI MegaRAID 320-2X card.

Anyone use any vendors that have been supportive of Postgresql?

Thanks for your help/feedback.

Steve

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan

Improvements are welcome, but to compete in the industry, loading will need to 
speed up by a factor of 100.

Note that Bizgres loader already does many of these ideas and it sounds like 
pgloader does too.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Dimitri Fontaine [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, February 06, 2008 12:41 PM Eastern Standard Time
To: pgsql-performance@postgresql.org
Cc: Greg Smith
Subject:Re: [PERFORM] Benchmark Data requested --- pgloader CE design 
ideas

Le mercredi 06 février 2008, Greg Smith a écrit :
> If I'm loading a TB file, odds are good I can split that into 4 or more
> vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders
> at once, and get way more than 1 disk worth of throughput reading.

pgloader already supports starting at any input file line number, and limit 
itself to any number of reads:

  -C COUNT, --count=COUNT
number of input lines to process
  -F FROMCOUNT, --from=FROMCOUNT
number of input lines to skip

So you could already launch 4 pgloader processes with the same configuration 
fine but different command lines arguments. It there's interest/demand, it's 
easy enough for me to add those parameters as file configuration knobs too.

Still you have to pay for client to server communication instead of having the 
backend read the file locally, but now maybe we begin to compete?

Regards,
-- 
dim

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan

Hi Greg,

On 2/6/08 7:56 AM, "Greg Smith" <[EMAIL PROTECTED]> wrote:

> If I'm loading a TB file, odds are good I can split that into 4 or more
> vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders
> at once, and get way more than 1 disk worth of throughput reading.  You
> have to play with the exact number because if you push the split too far
> you introduce seek slowdown instead of improvements, but that's the basic
> design I'd like to see one day.  It's not parallel loading that's useful
> for the cases I'm thinking about until something like this comes around.

Just load 4 relfiles.  You have to be able to handle partial relfiles, which
changes the storage mgmt a bit, but the benefits are easier to achieve.

- Luke


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Luke Lonergan

Hi Greg,

On 2/4/08 12:09 PM, "Greg Smith" <[EMAIL PROTECTED]> wrote:

> Do you have any suggestions on how people should run TPC-H?  It looked
> like a bit of work to sort through how to even start this exercise.

To run "TPC-H" requires a license to publish, etc.

However, I think you can use their published data and query generation kit
to run the queries, which aren't the benchmark per-se.  That's what the
Monet/X100 people did.

- Luke

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Luke Lonergan

Hi Simon,

On 2/4/08 11:07 AM, "Simon Riggs" <[EMAIL PROTECTED]> wrote:

>> "executor-executor" test and we/you should be sure that the PG planner has
>> generated the best possible plan.
> 
> If it doesn't then I'd regard that as a performance issue in itself.

Agreed, though that's two problems to investigate - I think the Monet/X100
stuff is clean in that it's a pure executor test.

> You maybe right, but I want to see where it hurts us the most.

You'll see :-)

>> The only way to cure this is to work on more rows than one at a time.
> 
> Do you have any results to show that's true, or are you just referring
> to the Cray paper? (Which used fixed length tuples and specific vector
> hardware).

No paper referenced, just inference from the results and their (and others)
conclusions about locality and re-use.  It's a similar enough situation to
scientific programming with vector processors versus cache based superscalar
that these are the right conclusions.  We've done the profiling to look at
cache misses and have some data to back it up as well.

> (With regard to benchmarks, I'd rather not download Monet at all. Helps
> avoid legal issues around did-you-look-at-the-code questions.)

None of us have looked at the code or downloaded it.  There are a number of
presentations out there for Monet/X100 to see what their results are.

- Luke

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Luke Lonergan

Hi Simon,

Note that MonetDB/X100 does not have a SQL optimizer, they ran raw
hand-coded plans.  As a consequence, these comparisons should be taken as an
"executor-executor" test and we/you should be sure that the PG planner has
generated the best possible plan.

That said, we've already done the comparisons internally and they've got a
good point to make about L2 cache use and removal of unnecessary
abstractions in the executor.  We've been aware of this since 2005/6 and
have been slowly working these ideas into our/PG executor.

Bottom line: it's a good thing to work to get close to the X100/Monet
executor with a more general purpose DB.  PG is a looong way from being
comparable, mostly due to poor L2 D-cache locality and I-cache thrashing in
the executor.  The only way to cure this is to work on more rows than one at
a time.

- Luke 

On 2/4/08 10:37 AM, "Simon Riggs" <[EMAIL PROTECTED]> wrote:

> Can I ask for some help with benchmarking?
> 
> There are some results here that show PostgreSQL is slower in some cases
> than Monet and MySQL. Of course these results were published immediately
> prior to 8.2 being released, plus run out-of-the-box, so without even
> basic performance tuning.
> 
> Would anybody like to repeat these tests with the latest production
> versions of these databases (i.e. with PGSQL 8.3), and with some
> sensible tuning settings for the hardware used? It will be useful to get
> some blind tests with more sensible settings.
> 
> http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/
> 
> Multiple runs from different people/different hardware is useful since
> they help to iron-out differences in hardware and test methodology. So
> don't worry if you see somebody else doing this also.
> 
> Thanks,

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] scheduler

2008-01-22 Thread Luke Lonergan

Deadline works best for us.  The new AS is getting better, but last we
tried there were issues with it.

- Luke 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Adrian Moisey
> Sent: Monday, January 21, 2008 11:01 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] scheduler
> 
> Hi
> 
> Which scheduler is recommended for a box that is dedicated to 
> running postgres?
> 
> I've asked google and found no answers.
> 
> --
> Adrian Moisey
> System Administrator | CareerJunction | Your Future Starts Here.
> Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
> Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 
> 21 686 6842
> 
> ---(end of 
> broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Luke Lonergan

Hi Peter,

If you run into a scaling issue with PG (you will at those scales 1TB+), you
can deploy Greenplum DB which is PG 8.2.5 compatible.  A large internet
company (look for press soon) is in production with a 150TB database on a
system capable of doing 400TB and we have others in production at 60TB,
40TB, etc.  We can provide references when needed - note that we had 20
successful customer references supporting Gartner's magic quadrant report on
data warehouses which put Greenplum in the "upper visionary" area of the
magic quadrant - which only happens if your customers can scale (see this:
http://www.esj.com/business_intelligence/article.aspx?EditorialsID=8712)

In other words, no matter what happens you'll be able to scale up with your
Postgres strategy.

- Luke

On 11/26/07 10:44 AM, "Pablo Alcaraz" <[EMAIL PROTECTED]> wrote:

> I had a client that tried to use Ms Sql Server to run a 500Gb+ database.
> The database simply colapsed. They switched to Teradata and it is
> running good. This database has now 1.5Tb+.
> 
> Currently I have clients using postgresql huge databases and they are
> happy. In one client's database the biggest table has 237Gb+ (only 1
> table!) and postgresql run the database without problem using
> partitioning, triggers and rules (using postgresql 8.2.5).
> 
> Pablo
> 
> Peter Koczan wrote:
>> Hi all,
>> 
>> I have a user who is looking to store 500+ GB of data in a database
>> (and when all the indexes and metadata are factored in, it's going to
>> be more like 3-4 TB). He is wondering how well PostgreSQL scales with
>> TB-sized databases and what can be done to help optimize them (mostly
>> hardware and config parameters, maybe a little advocacy). I can't
>> speak on that since I don't have any DBs approaching that size.
>> 
>> The other part of this puzzle is that he's torn between MS SQL Server
>> (running on Windows and unsupported by us) and PostgreSQL (running on
>> Linux...which we would fully support). If any of you have ideas of how
>> well PostgreSQL compares to SQL Server, especially in TB-sized
>> databases, that would be much appreciated.
>> 
>> We're running PG 8.2.5, by the way.
>> 
>> Peter
>> 
>> ---(end of broadcast)---
>> TIP 4: Have you searched our list archives?
>> 
>>http://archives.postgresql.org
>> 
>>   
> 
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-11-07 Thread Luke Lonergan

On 11/7/07 10:21 PM, "Gregory Stark" <[EMAIL PROTECTED]> wrote:

>> part=# explain SELECT * FROM n_traf ORDER BY date_time LIMIT 1;
>>   QUERY PLAN
>> -
>> 
>> Limit  (cost=198367.14..198367.15 rows=1 width=20)
>>   ->  Sort  (cost=198367.14..200870.92 rows=1001510 width=20)
>> Sort Key: public.n_traf.date_time
>> ->  Result  (cost=0.00..57464.92 rows=1001510 width=20)
>>   ->  Append  (cost=0.00..57464.92 rows=1001510 width=20)
>> ->  Index Scan using n_traf_date_time_login_id on n_traf
>> (cost=0.00..66.90 rows=1510 width=20)
> 
> That looks suspicious. There's likely no good reason to be using the index
> scan unless it avoids the sort node above the Append node. That's what I hope
> to do by having the Append executor code do what's necessary to maintain the
> order.

Yah - the way it works in GPDB is that you get a non-sorting plan with an
index scan below the parent - that was the point of the fix. Hmm.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-10-31 Thread Luke Lonergan

BTW - Mark has volunteered to work a Postgres patch together.  Thanks Mark!

- Luke


On 10/29/07 10:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

> Luke Lonergan wrote:
>> Sure - it's here:
>>   http://momjian.us/mhonarc/patches_hold/msg00381.html
>> 
>>   
> 
> To clarify - we've fixed this in Greenplum db - the patch as submitted
> is (hopefully) a hint about how to fix it in Postgres, rather than a
> working patch... as its full of non-postgres functions and macros:
> 
> CdbPathLocus_MakeHashed
> cdbpathlocus_pull_above_projection
> cdbpullup_findPathKeyItemInTargetList
> cdbpullup_makeVar
> cdbpullup_expr
> 
> Cheers
> 
> Mark



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-10-29 Thread Luke Lonergan

Sure - it's here:
  http://momjian.us/mhonarc/patches_hold/msg00381.html

- Luke


On 10/29/07 6:40 AM, "Gregory Stark" <[EMAIL PROTECTED]> wrote:

> "Luke Lonergan" <[EMAIL PROTECTED]> writes:
> 
>> And I repeat - 'we fixed that and submitted a patch' - you can find it in the
>> unapplied patches queue.
> 
> I can't find this. Can you point me towards it?
> 
> Thanks



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESCLIMIT 1

2007-10-27 Thread Luke Lonergan

Works great - plans no longer sort, but rather use indices as expected.  It's 
in use in Greenplum now.

It's a simple approach, should easily extend from gpdb to postgres. The patch 
is against gpdb so someone needs to 'port' it.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Simon Riggs [mailto:[EMAIL PROTECTED]
Sent:   Saturday, October 27, 2007 05:34 PM Eastern Standard Time
To: Luke Lonergan
Cc: Heikki Linnakangas; Anton; pgsql-performance@postgresql.org
Subject:Re: [PERFORM] partitioned table and ORDER BY indexed_field 
DESCLIMIT 1

On Sat, 2007-10-27 at 15:12 -0400, Luke Lonergan wrote:
> And I repeat - 'we fixed that and submitted a patch' - you can find it
> in the unapplied patches queue.

I got the impression it was a suggestion rather than a tested patch,
forgive me if that was wrong.

Did the patch work? Do you have timings/different plan?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-10-27 Thread Luke Lonergan

I just read the lead ups to this post - didn't see Tom and Greg's comments.

The approach we took was to recognize the ordering of child nodes and propagate 
that to the append in the special case of only one child (after CE).  This is 
the most common use-case in 'partitioning', and so is an easy, high payoff low 
amount of code fix.

I'd suggest we take this approach while also considering a more powerful set of 
append merge capabilities.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-----
From:   Luke Lonergan [mailto:[EMAIL PROTECTED]
Sent:   Saturday, October 27, 2007 03:14 PM Eastern Standard Time
To: Heikki Linnakangas; Anton
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC 
LIMIT 1

And I repeat - 'we fixed that and submitted a patch' - you can find it in the 
unapplied patches queue.

The patch isn't ready for application, but someone can quickly implement it I'd 
expect.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Heikki Linnakangas [mailto:[EMAIL PROTECTED]
Sent:   Saturday, October 27, 2007 05:20 AM Eastern Standard Time
To: Anton
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC 
LIMIT 1

Anton wrote:
> I repost here my original question "Why it no uses indexes?" (on
> partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you
> mean that you miss this discussion.

As I said back then:

The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
below the append node.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-10-27 Thread Luke Lonergan

And I repeat - 'we fixed that and submitted a patch' - you can find it in the 
unapplied patches queue.

The patch isn't ready for application, but someone can quickly implement it I'd 
expect.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Heikki Linnakangas [mailto:[EMAIL PROTECTED]
Sent:   Saturday, October 27, 2007 05:20 AM Eastern Standard Time
To: Anton
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC 
LIMIT 1

Anton wrote:
> I repost here my original question "Why it no uses indexes?" (on
> partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you
> mean that you miss this discussion.

As I said back then:

The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
below the append node.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Barcelona vs Tigerton

2007-09-11 Thread Luke Lonergan

Mindaugas,

The Anandtech results appear to me to support a 2.5 GHz Barcelona
performing better than the available Intel CPUs overall.

If you can wait for the 2.5 GHz AMD parts to come out, they'd be a
better bet IMO especially considering 4 sockets.  In fact, have you seen
quad QC Intel benchmarks?

BTW - Can someone please get Anand a decent PG benchmark kit? :-)

At least we can count on excellent PG bench results from the folks at
Tweakers.

- Luke

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Mindaugas
> Sent: Tuesday, September 11, 2007 12:58 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Barcelona vs Tigerton
> 
>   Hello,
> 
>   Now that both 4x4 out it's time for us to decide which one 
> should be better for our PostgreSQL and Oracle. And 
> especially for Oracle we really need such server to squeeze 
> everything from Oracle licenses. Both of the databases handle 
> OLTP type of the load.
>   Since we plan to buy 4U HP DL580 or 585 and only very few 
> of them so power ratings are not very critical in this our case.
> 
>   First benchmarks 
> (http://www.anandtech.com/IT/showdoc.aspx?i=3091) show that 
> Intel still has more raw CPU power but Barcelona scales much 
> better and also has better memory bandwidth which I believe 
> is quite critical with 16 cores and DB usage pattern.
>   On the other hand Intel's X7350 (2.93GHz) has almost 50% 
> advantage in CPU frequency against 2GHz Barcelona.
> 
>   Regards,
> 
>   Mindaugas
> 
>   P.S. tweakers.net does not have both of those yet? Test 
> results far away? :)
> 
> ---(end of 
> broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate
> 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-11 Thread Luke Lonergan

Greg, 

> I think this seems pretty impractical for regular 
> (non-bitmap) index probes though. You might be able to do it 
> sometimes but not very effectively and you won't know when it 
> would be useful.

Maybe so, though I think it's reasonable to get multiple actuators going
even if the seeks are truly random.  It's a dynamic / tricky business to
determine how many pending seeks to post, but it should be roughly close
to the number of disks in the pool IMO.

> I think what this means is that there are actually *three* 
> kinds of i/o: 1) Sequential which means you get the full 
> bandwidth of your drives * the number of spindles; 2) Random 
> which gets you 1 block per seek latency regardless of how 
> many spindles you have; and 3) Random but with prefetch which 
> gets you the random bandwidth above times the number of spindles.

Perhaps so, though I'm more optimistic that prefetch would help most
random seek situations.

For reasonable amounts of concurrent usage this point becomes moot - we
get the benefit of multiple backends doing seeking anyway, but I think
if we do dynamic prefetch right it would degenerate gracefully in those
circumstances.

> The extra spindles speed up sequential i/o too so the ratio 
> between sequential and random with prefetch would still be 
> about 4.0. But the ratio between sequential and random 
> without prefetch would be even higher.

Right :-)

- Luke


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-10 Thread Luke Lonergan

Hi Josh,

On 9/10/07 2:26 PM, "Josh Berkus" <[EMAIL PROTECTED]> wrote:

> So, when is this getting contributed?  ;-)

Yes, that's the right question to ask :-)

One feeble answer: "when we're not overwhelmed by customer activity"...

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-10 Thread Luke Lonergan

Hi Mark, Greg,

On 9/10/07 3:08 PM, "Mark Mielke" <[EMAIL PROTECTED]> wrote:

> One suggestion: The plan is already in a tree. With some dependency analysis,
> I assume the tree could be executed in parallel (multiple threads or event
> triggered entry into a state machine), and I/O to fetch index pages or table
> pages could be scheduled in parallel. At this point, AIO becomes necessary to
> let the underlying system (and hardware with tagged queueing?) schedule which
> pages should be served best first.

Right now the pattern for index scan goes like this:

- Find qualifying TID in index
  - Seek to TID location in relfile
- Acquire tuple from relfile, return

When the tuples are widely distributed in the table, as is the case with a
very selective predicate against an evenly distributed attribute on a
relation 2x larger than the I/O cache + bufcache, this pattern will result
in effectively "random I/O".  In actual fact, the use of the in-memory
bitmap index will make the I/Os sequential, but sparse, which is another
version of "random" if the sequential I/Os are larger than the
gather/scatter I/O aggregation in the OS scheduler (say 1MB).  This is a
very common circumstance for DSS / OLAP / DW workloads.

For plans that qualify with the above conditions, the executor will issue
blocking calls to lseek(), which will translate to a single disk actuator
moving to the needed location in seek_time, approximately 8ms.  The
seek_time for a single query will not improve with the increase in number of
disks in an underlying RAID pool, so we can do about 1000/8 = 125 seeks per
second no matter what I/O subsystem we have.

If we implement AIO and allow for multiple pending I/Os used to prefetch
groups of qualifying tuples, basically a form of random readahead, we can
improve the throughput for any given query by taking advantage of multiple
disk actuators.  This will work for RAID5, RAID10 and other disk pooling
mechanisms because the lseek() will be issued as parallel events.  Note that
the same approach would also work to speed sequential access by overlapping
compute and I/O.

- Luke

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

2007-09-10 Thread Luke Lonergan

Should be a lot higher, something like 10-15 is approximating accurate.

Increasing the number of disks in a RAID actually makes the number higher,
not lower.  Until Postgres gets AIO + the ability to post multiple
concurrent IOs on index probes, random IO does not scale with increasing
disk count, but sequential does, thus the increasing "random page cost" as
the RAID gets faster.

The reason to change the number is to try to discourage the planner from
choosing index scans too aggressively.  We (GP) have implemented something
we call "Adaptive Nested Loop" to replace a nested loop + index scan with a
hash join when the selectivity estimates are off in order to improve this
behavior.  We also run with a "random_page_cost=100" because we generally
run on machines with fast sequential I/O.

- Luke 

On 9/10/07 12:25 PM, "Carlo Stonebanks" <[EMAIL PROTECTED]>
wrote:

> Can anyone answer this for me: Although I realize my client's disk subsystem
> (SCSI/RAID Smart Array E200 controller using RAID 1) is less than
> impressive - is the default setting of 4.0 realistic or could it be lower?
> 
> Thanks!
> 
> 
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-08-24 Thread Luke Lonergan

Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
it?

This enables the use of index scan of a child table by recognizing sort
order of the append node.  Kurt Harriman did the work.

- Luke

Index: cdb-pg/src/backend/optimizer/path/indxpath.c
===
RCS file: 
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/indxpath.c,v
diff -u -N -r1.22 -r1.22.2.1
--- cdb-pg/src/backend/optimizer/path/indxpath.c25 Apr 2007 22:07:21
-1.22
+++ cdb-pg/src/backend/optimizer/path/indxpath.c10 Aug 2007 03:41:15
-1.22.2.1
@@ -379,8 +379,51 @@
 index_pathkeys = build_index_pathkeys(root, index,
   ForwardScanDirection,
   true);
-useful_pathkeys = truncate_useless_pathkeys(root, rel,
-index_pathkeys);
+/*
+ * CDB: For appendrel child, pathkeys contain Var nodes in
terms 
+ * of the child's baserel.  Transform the pathkey list to refer
to 
+ * columns of the appendrel.
+ */
+if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+{
+AppendRelInfo  *appinfo = NULL;
+RelOptInfo *appendrel = NULL;
+ListCell   *appcell;
+CdbPathLocusnotalocus;
+
+/* Find the appendrel of which this baserel is a child. */
+foreach(appcell, root->append_rel_list)
+{
+appinfo = (AppendRelInfo *)lfirst(appcell);
+if (appinfo->child_relid == rel->relid)
+break;
+}
+Assert(appinfo);
+appendrel = find_base_rel(root, appinfo->parent_relid);
+
+/*
+ * The pathkey list happens to have the same format as the
+ * partitioning key of a Hashed locus, so by disguising it
+ * we can use cdbpathlocus_pull_above_projection() to do
the 
+ * transformation.
+ */
+CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
+notalocus =
+cdbpathlocus_pull_above_projection(root,
+   notalocus,
+   rel->relids,
+   rel->reltargetlist,
+  
appendrel->reltargetlist,
+   appendrel->relid);
+if (CdbPathLocus_IsHashed(notalocus))
+index_pathkeys = truncate_useless_pathkeys(root,
appendrel,
+  
notalocus.partkey);
+else
+index_pathkeys = NULL;
+}
+
+useful_pathkeys = truncate_useless_pathkeys(root, rel,
+index_pathkeys);
 }
 else
 useful_pathkeys = NIL;
Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
===
RCS file: 
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/pathkeys.c,v
diff -u -N -r1.18 -r1.18.2.1
--- cdb-pg/src/backend/optimizer/path/pathkeys.c30 Apr 2007 05:44:07
-1.18
+++ cdb-pg/src/backend/optimizer/path/pathkeys.c10 Aug 2007 03:41:15
-1.18.2.1
@@ -1403,55 +1403,53 @@
 {
 PathKeyItem*item;
 Expr   *newexpr;
+AttrNumber  targetindex;
 
 Assert(pathkey);
 
-/* Use constant expr if available.  Will be at head of list. */
-if (CdbPathkeyEqualsConstant(pathkey))
+/* Find an expr that we can rewrite to use the projected columns. */
+item = cdbpullup_findPathKeyItemInTargetList(pathkey,
+ relids,
+ targetlist,
+ &targetindex); // OUT
+
+/* If not found, see if the equiv class contains a constant expr. */
+if (!item &&
+CdbPathkeyEqualsConstant(pathkey))
 {
 item = (PathKeyItem *)linitial(pathkey);
 newexpr = (Expr *)copyObject(item->key);
 }
 
-/* New vars for old! */
-else
-{
-AttrNumber  targetindex;
+/* Fail if no usable expr. */
+else if (!item)
+return NULL;
 
-/* Find an expr that we can rewrite to use the projected columns.
*/
-item = cdbpullup_findPathKeyItemInTargetList(pathkey,
- relids,
- targetlist,
- &targetindex); // OUT
-if

Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

2007-08-24 Thread Luke Lonergan

We just fixed this - I'll post a patch, but I don't have time to verify
against HEAD.

- Luke


On 8/24/07 3:38 AM, "Heikki Linnakangas" <[EMAIL PROTECTED]> wrote:

> Anton wrote:
 =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
QUERY PLAN
>>> 
>>> -
 Limit  (cost=824637.69..824637.69 rows=1 width=32)
->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
  Sort Key: public.n_traf.date_time
  ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
  ->  Seq Scan on n_traf  (cost=0.00..22.30
 rows=1230 width=32)
  ->  Seq Scan on n_traf_y2007m01 n_traf
 (cost=0.00..22.30 rows=1230 width=32)
>> ...
  ->  Seq Scan on n_traf_y2007m12 n_traf
 (cost=0.00..22.30 rows=1230 width=32)
 (18 rows)
 
 Why it no uses indexes at all?
 ---
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.
>> 
>> Agree. But why it not uses indexes when it reading them?
> 
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [pgsql-jobs] Looking for database hosting

2007-08-19 Thread Luke Lonergan

Andrew,

I'd say that commodity systems are the fastest with postgres - many have seen 
big slowdowns with high end servers.  'Several orders of magnitude' is not 
possible by just changing the HW, you've got a SW problem to solve first.  We 
have done 100+ times faster than both Postgres and popular (even gridded) 
commercial DBMS using an intrinsically parallel SW approach.

If the objective is OLAP / DSS there's no substitute for a parallel DB that 
does query and load / transform using all the CPUs and IO channels 
simultaneously.  This role is best met from a value standpoint by clustering 
commodity systems.

For OLTP, we need better SMP and DML algorithmic optimizations for concurrency, 
at which point big SMP machines work.  Right now you can buy a 32 CPU commodity 
(opteron) machine from SUN (X4600) for about $60K loaded.

WRT hosting, we've done a bit of it on GPDB systems, but we're not making it a 
focus area.  Instead, we do subscription pricing by the amount of data used and 
recommend / help get systems set up.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Andrew Hammond [mailto:[EMAIL PROTECTED]
Sent:   Sunday, August 19, 2007 03:49 PM Eastern Standard Time
To: Niklas Saers
Cc: [EMAIL PROTECTED]; pgsql-performance@postgresql.org
Subject:Re: [PERFORM] [pgsql-jobs] Looking for database hosting

Nik, you may be underestimating just how much performance can be obtained
from a single database server. For example, an IBM p595 server connected to
an array of ds8300 storage devices could reasonably be expected to provide
several orders of magnitude more performance when compared to commodity
hardware. In commodity space (albeit, just barely), a 16 core opteron
running (the admittedly yet-to-be-released) FreeBSD 7, and a suitably
provisioned SAN should also enormously outperform a beige-box solution, and
at a fraction of the cost. If it's performance you care about then the
pgsql-performance list (which I have cc'd) is the place to talk about it.

I realize this doesn't address your desire to get out of database server
administration. I am not aware of any company which provides database
hosting, further I'm not entirely convinced that's a viable business
solution. The technical issues (security, latency and reliability are the
ones that immediately come to mind) associated with a hosted database server
solution suggest to me that this would not be economically viable. The
business issues around out-sourcing a critical, if not central component of
your architecture seem, at least to me, to be insurmountable.

Andrew

On 8/19/07, Niklas Saers <[EMAIL PROTECTED]> wrote:
>
> Hi,
> the company I'm doing work for is expecting a 20 times increase in
> data and seeks a 10 times increase in performance. Having pushed our
> database server to the limit daily for the past few months we have
> decided we'd prefer to be database users rather than database server
> admins. :-)
>
> Are you or can you recommend a database hosting company that is good
> for clients that require more power than what a single database
> server can offer?
>
> Cheers
>
> Nik
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>

Re: [PERFORM] Integrated perc 5/i

2007-08-16 Thread Luke Lonergan

Yay - looking forward to your results!

- Luke


On 8/16/07 3:14 PM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote:

> On 8/16/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
>> 
>>  Hi Michael,
>> 
>>  There is a problem with some Dell "perc 5" RAID cards, specifically we've
>> had this problem with the 2950 as of 6 months ago  they do not support
>> RAID10.  They have a setting that sounds like RAID10, but it actually
>> implements spanning of mirrors.  This means that you will not get more than
>> one disk worth of performance whether you are performing random seeks
>> (within a one disk sized area) or sequential transfers.
>> 
>>  I recommend you read the section in the Dell configuration guide very
>> carefully and look for supplemental sources of technical information about
>> it.  We found the issue clearly explained in a Dell technical memo that I
>> don't have in front of me  we were shocked to find this out.
> 
> interesting.  this may also be true of the other 'rebrands' of the lsi
> logic chipset.  for example, the ibm 8480/exp3000 sets up the same
> way, namely you do the 'spanadd' function of the firmware which layers
> the raids.
> 
> fwiw, I will be testing perc 5 raid 10, 01, 00, and 05 in a dual
> controller controller configuration as well as dual controller
> configuration over the md1000 (which is active/active) in a few days.
> This should give very good support to your claim if the arrays spanned
> in software singnificantly outperform a single controller.
> 
> merlin



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Integrated perc 5/i

2007-08-16 Thread Luke Lonergan

Hi Michael,

There is a problem with some Dell ³perc 5² RAID cards, specifically we¹ve
had this problem with the 2950 as of 6 months ago  they do not support
RAID10.  They have a setting that sounds like RAID10, but it actually
implements spanning of mirrors.  This means that you will not get more than
one disk worth of performance whether you are performing random seeks
(within a one disk sized area) or sequential transfers.

I recommend you read the section in the Dell configuration guide very
carefully and look for supplemental sources of technical information about
it.  We found the issue clearly explained in a Dell technical memo that I
don¹t have in front of me  we were shocked to find this out.

As suggested  the RAID5 numbers from these controllers are very strong.

- Luke

On 8/16/07 1:26 AM, "Michael Ben-Nes" <[EMAIL PROTECTED]> wrote:

> Hi
> 
> I wanted to know if the integrated perc 5/i which come with Dell 2950 will
> yield maximum performance from RAID 10 ( 15K SAS ).
> Or should I ask for different card ?
> 
> I read an old post that shows that RAID 10 does not work eficently under perc
> 5/i 
> http://groups.google.com/group/pgsql.performance/browse_thread/thread/b85926fe
> 6de1f6c2/38837995887b6033?lnk=st&q=perc+5%2Fi+performance&rnum=6&hl=en#3883799
> 5887b6033  
>  e6de1f6c2/38837995887b6033?lnk=st&q=perc+5%2Fi+performance&rnum=6&
> hl=en#38837995887b6033>
> 
> Does any one have any experience with RAID 10 & perc 5/i ?
> 
> Thanks,
> Miki

Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-08-01 Thread Luke Lonergan

Marc,

You should expect that for the kind of OLAP workload you describe in steps 2
and 3 you will have exactly one CPU working for you in Postgres.

If you want to accelerate the speed of this processing by a factor of 100 or
more on this machine, you should try Greenplum DB which is Postgres 8.2
compatible.  Based on the overall setup you describe, you may have a hybrid
installation with GPDB doing the reporting / OLAP workload and the other
Postgres databases handling the customer workloads.

- Luke


On 7/24/07 7:38 AM, "Marc Mamin" <[EMAIL PROTECTED]> wrote:

>  
> Hello,
> 
> thank you for all your comments and recommendations.
> 
> I'm aware that the conditions for this benchmark are not ideal, mostly
> due to the lack of time to prepare it. We will also need an additional
> benchmark on a less powerful - more realistic - server to better
> understand the scability of our application.
> 
> 
> Our application is based on java and is generating dynamic reports from
> log files content. Dynamic means here that a repor will be calculated
> from the postgres data the first time it is requested (it will  then be
> cached). Java is used to drive the data preparation and to
> handle/generate the reports requests.
> 
> This is much more an OLAP system then an OLTP, at least for our
> performance concern.
> 
> 
> 
> 
> Data preparation:
> 
> 1) parsing the log files with a heavy use of perl (regular expressions)
> to generate csv files. Prepared statements also maintain reference
> tables in the DB. Postgres performance is not an issue for this first
> step.
> 
> 2) loading the csv files with COPY. As around 70% of the data to load
> come in a single daily table, we don't allow concurrent jobs for this
> step. We have between a few and a few hundreds files to load into a
> single table; they are processed one after the other. A primary key is
> always defined; for the case when the required indexes are alreay built
> and when the new data are above a given size, we are using a "shadow"
> table  instead (without the indexes) , build the index after the import
> and then replace the live table with the shadow one.
> For example, we a have a table of 13 GB + 11 GB indexes (5 pieces).
> 
> Performances :
> 
> a) is there an "ideal" size to consider for our csv files (100 x 10
> MB or better 1 x 1GB ?)
> b) maintenance_work_mem: I'll use around 1 GB as recommended by
> Stefan
> 
> 3) Data agggregation. This is the heaviest part for Postgres. On our
> current system some queries need above one hour, with phases of around
> 100% cpu use, alterning with times of heavy i/o load when temporary
> results are written/read to the plate (pgsql_tmp). During the
> aggregation, other postgres activities are low (at least should be) as
> this should take place at night. Currently we have a locking mechanism
> to avoid having more than one of such queries running concurently. This
> may be to strict for the benchmark server but better reflect our current
> hardware capabilities.
> 
> Performances : Here we should favorise a single huge transaction and
> consider a low probability to have another transaction requiring large
> sort space. Considering this, is it reasonable to define work_mem being
> 3GB (I guess I should raise this parameter dynamically before running
> the aggregation queries)
> 
> 4) Queries (report generation)
> 
> We have only few requests which are not satisfying while requiring large
> sort operations. The data are structured in different aggregation levels
> (minutes, hours, days) with logical time based partitions in oder to
> limit the data size to compute for a given report. Moreover we can scale
> our infrastrucure while using different or dedicated Postgres servers
> for different customers. Smaller customers may share a same instance,
> each of them having its own schema (The lock mechanism for large
> aggregations apply to a whole Postgres instance, not to a single
> customer) . The benchmark will help us to plan such distribution.
> 
> During the benchmark, we will probably not have more than 50 not idle
> connections simultaneously. It is a bit too early for us to fine tune
> this part. The benchmark will mainly focus on the steps 1 to 3
> 
> During the benchmark, the Db will reach a size of about 400 GB,
> simulating 3 different customers, also with data quite equally splitted
> in 3 scheemas.
> 
> 
> 
> I will post our configuration(s) later on.
> 
> 
> 
> Thanks again for all your valuable input.
> 
> Marc Mamin
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-30 Thread Luke Lonergan

Hi Dimitri,

Can you post some experimental evidence that these settings matter?

At this point we have several hundred terabytes of PG databases running on ZFS, 
all of them setting speed records for data warehouses.

We did testing on these settings last year on S10U2, perhaps things have 
changed since then.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Dimitri [mailto:[EMAIL PROTECTED]
Sent:   Monday, July 30, 2007 05:26 PM Eastern Standard Time
To: Luke Lonergan
Cc: Josh Berkus; pgsql-performance@postgresql.org; Marc Mamin
Subject:Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

Luke,

ZFS tuning is not coming from general suggestion ideas, but from real
practice...

So,
  - limit ARC is the MUST for the moment to keep your database running
comfortable (specially DWH!)
  - 8K blocksize is chosen to read exactly one page when PG ask to
read one page - don't mix it with prefetch! when prefetch is detected,
ZFS will read next blocks without any demand from PG; but otherwise
why you need to read more  pages each time PG asking only one?...
  - prefetch of course not needed for OLTP, but helps on OLAP/DWH, agree :)

Rgds,
-Dimitri


On 7/22/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
> Josh,
>
> On 7/20/07 4:26 PM, "Josh Berkus" <[EMAIL PROTECTED]> wrote:
>
> > There are some specific tuning parameters you need for ZFS or performance
> > is going to suck.
> >
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> > (scroll down to "PostgreSQL")
> > http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
> >
> > You also don't say anything about what kind of workload you're running.
>
>
> I think we're assuming that the workload is OLTP when putting these tuning
> guidelines forward.  Note that the ZFS tuning guidance referred to in this
> bug article recommend "turning vdev prefetching off" for "random I/O
> (databases)".  This is exactly the opposite of what we should do for OLAP
> workloads.
>
> Also, the lore that setting recordsize on ZFS is mandatory for good database
> performance is similarly not appropriate for OLAP work.
>
> If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of the
> tuning information from Sun that refers generically to "database".  The
> untuned ZFS performance should be far better in those cases.  Specifically,
> these three should be ignored:
> - (ignore this) limit ARC memory use
> - (ignore this) set recordsize to 8K
> - (ignore this) turn off vdev prefetch
>
> - Luke
>
>
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>

Re: [PERFORM] update, truncate and vacuum

2007-07-26 Thread Luke Lonergan

1) Yes

All rows are treated the same, there are no in place updates.

2) No

Truncate recreates the object as a new one, releasing the space held by the old 
one.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Scott Feldstein [mailto:[EMAIL PROTECTED]
Sent:   Thursday, July 26, 2007 06:44 PM Eastern Standard Time
To: pgsql-performance@postgresql.org
Subject:[PERFORM] update, truncate and vacuum

Hi,
I have a couple questions about how update, truncate and vacuum would  
work together.

1) If I update a table foo (id int, value numeric (20, 6))
with
update foo set value = 100 where id = 1

Would a vacuum be necessary after this type of operation since the  
updated value is a numeric? (as opposed to a sql type where its size  
could potentially change i.e varchar)

2) After several updates/deletes to a table, if I truncate it, would  
it be necessary to run vacuum in order to reclaim the space?

thanks,
Scott

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-22 Thread Luke Lonergan

Josh,

On 7/20/07 4:26 PM, "Josh Berkus" <[EMAIL PROTECTED]> wrote:

> There are some specific tuning parameters you need for ZFS or performance
> is going to suck.
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> (scroll down to "PostgreSQL")
> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
> http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
> 
> You also don't say anything about what kind of workload you're running.

I think we're assuming that the workload is OLTP when putting these tuning
guidelines forward.  Note that the ZFS tuning guidance referred to in this
bug article recommend "turning vdev prefetching off" for "random I/O
(databases)".  This is exactly the opposite of what we should do for OLAP
workloads.

Also, the lore that setting recordsize on ZFS is mandatory for good database
performance is similarly not appropriate for OLAP work.

If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of the
tuning information from Sun that refers generically to "database".  The
untuned ZFS performance should be far better in those cases.  Specifically,
these three should be ignored:
- (ignore this) limit ARC memory use
- (ignore this) set recordsize to 8K
- (ignore this) turn off vdev prefetch

- Luke

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] [PERFORM] Parrallel query execution for UNION ALL Queries

2007-07-19 Thread Luke Lonergan

Dimitri,

> Seems to me that : 
>  - GreenPlum provides some commercial parallel query engine on top of
>PostgreSQL,

I certainly think so and so do our customers in production with 100s of
terabytes :-)
  
>  - plproxy could be a solution to the given problem.
>https://developer.skype.com/SkypeGarage/DbProjects/PlProxy

This is solving real world problems at Skype of a different kind than
Greenplum, well worth checking out.

- Luke


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x

2007-06-04 Thread Luke Lonergan

Absolutely!

A summary of relevant comments so far are:
- enable-mergejoin
- shared-buffers
- fsync

Another to consider if you use indexes is random-page-cost.

What would be helpful is if you could identify a slow query and post the 
explain analyze here.

The concurrent performance of many users should just be faster with 8.2, so I'd 
think it's a problem with plans.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Douglas J Hunley [mailto:[EMAIL PROTECTED]
Sent:   Monday, June 04, 2007 08:40 AM Eastern Standard Time
To: Luke Lonergan
Cc: Tom Lane; pgsql-performance@postgresql.org
Subject:Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse 
performance then 7.4.x

On Sunday 03 June 2007 16:39:51 Luke Lonergan wrote:
> When you initdb, a config file is edited from the template by initdb to
> reflect your machine config.

I didn't realize that. I'll have to harass the rest of the team to see if 
someone overwrote that file or not. In the interim, I did an 'initdb' to 
another location on the same box and then copied those values into the config 
file. That's cool to do, I assume?

-- 
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

Cowering in a closet is starting to seem like a reasonable plan.

Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x

2007-06-03 Thread Luke Lonergan

When you initdb, a config file is edited from the template by initdb to reflect 
your machine config.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Douglas J Hunley [mailto:[EMAIL PROTECTED]
Sent:   Sunday, June 03, 2007 02:30 PM Eastern Standard Time
To: Tom Lane
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse 
performance then 7.4.x

On Saturday 02 June 2007 11:25:11 Tom Lane wrote:
> Another thing that seems strange is that the 8.2 config file does not
> seem to have been processed by initdb --- or did you explicitly comment
> out the settings it made?

I don't understand this comment. You are saying 'initdb' will make changes to 
the file? The file I sent is the working copy from the machine in question.

-- 
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

"Does it worry you that you don't talk any kind of sense?"

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread Luke Lonergan

Steinar,

On 6/1/07 2:35 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:

> Either do your md discovery in userspace via mdadm (your distribution can
> probably help you with this), or simply use the raid10 module instead of
> building raid1+0 yourself.

I found md raid10 to be *very* slow compared to raid1+0 on Linux 2.6.9 ->
2.6.18.  Very slow in this case is < 400 MB/s compared to 1,800 MB/s.

- Luke 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread Luke Lonergan

Dimitri,

LVM is great, one thing to watch out for: it is very slow compared to pure
md.  That will only matter in practice if you want to exceed 1GB/s of
sequential I/O bandwidth.

- Luke


On 6/1/07 11:51 AM, "Dimitri" <[EMAIL PROTECTED]> wrote:

> Craig,
> 
> to make things working properly here you need to create a config file
> keeping both raid1 and raid0 information (/etc/mdadm/mdadm.conf).
> However if your root filesystem is corrupted, or you loose this file,
> or move disks somewhere else - you are back to the same initial issue
> :))
> 
> So, the solution I've found 100% working in any case is: use mdadm to
> create raid1 devices (as you do already) and then use LVM to create
> raid0 volume on it - LVM writes its own labels on every MD devices and
> will find its volumes peaces automatically! Tested for crash several
> times and was surprised by its robustness :))
> 
> Rgds,
> -Dimitri
> 
> On 6/1/07, Craig James <[EMAIL PROTECTED]> wrote:
>> Apologies for a somewhat off-topic question, but...
>> 
>> The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.
>>  It detects the two RAID1 arrays, the partitions of which are marked
>> properly.  But it can't find the RAID0 on top of that, because there's no
>> corresponding device to auto-detect.  The result is that it creates /dev/md0
>> and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't
>> created, so the RAID0 can't be assembled at boot time.
>> 
>> Here's what it looks like:
>> 
>> $ cat /proc/mdstat
>> Personalities : [raid0] [raid1]
>> md2 : active raid0 md0[0] md1[1]
>>   234436224 blocks 64k chunks
>> 
>> md1 : active raid1 sde1[1] sdc1[2]
>>   117218176 blocks [2/2] [UU]
>> 
>> md0 : active raid1 sdd1[1] sdb1[0]
>>   117218176 blocks [2/2] [UU]
>> 
>> $ uname -r
>> 2.6.12-1.1381_FC3
>> 
>> After a reboot, I always have to do this:
>> 
>>   mknod /dev/md2 b 9 2
>>   mdadm --assemble /dev/md2 /dev/md0 /dev/md1
>>   mount /dev/md2
>> 
>> What am I missing here?
>> 
>> Thanks,
>> Craig
>> 
>> ---(end of broadcast)---
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>subscribe-nomail command to [EMAIL PROTECTED] so that your
>>message can get through to the mailing list cleanly
>> 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Luke Lonergan

Mark,

On 5/30/07 8:57 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> One part is corruption. Another is ordering and consistency. ZFS represents
> both RAID-style storage *and* journal-style file system. I imagine consistency
> and ordering is handled through journalling.

Yep and versioning, which answers PFC's scenario.

Short answer: ZFS has a very reliable model that uses checksumming and
journaling along with block versioning to implement "self healing".  There
are others that do some similar things with checksumming on the SAN HW and
cooperation with the filesystem.

- Luke

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Bad RAID1 read performance

2007-05-30 Thread Luke Lonergan

Albert,

On 5/30/07 8:00 AM, "Albert Cervera Areny" <[EMAIL PROTECTED]> wrote:

> Hardware isn't very good I believe, and it's about 2-3 years old, but the RAID
> is Linux software, and though not very good the difference between reading
> and writing should probably be greater... (?)

Not for one thread/process of I/O.  Mirror sets can nearly double the read
performance on most RAID adapters or SW RAID when using two or more
thread/processes, but a single thread will get one drive worth of
performance.

You should try running two simultaneous processes during reading and see
what you get.

> Would you set 512Kb readahead on both drives and RAID? I tried various
> configurations and none seemed to make a big difference. It seemed correct to
> me to set 512kb per drive and 1024kb for md0.

Shouldn't matter that much, but yes, each drive getting half the readahead
is a good strategy.  Try 256+256 and 512.

The problem you have is likely not related to the readahead though - I
suggest you try read/write to a single disk and see what you get.  You
should get around 60 MB/s if the drive is a modern 7200 RPM SATA disk.  If
you aren't getting that on a single drive, there's something wrong with the
SATA driver or the drive(s).

- Luke 
> A Dimecres 30 Maig 2007 16:09, Luke Lonergan va escriure:
>> This sounds like a bad RAID controller - are you using a built-in hardware
>> RAID?  If so, you will likely want to use Linux software RAID instead.
>> 
>> Also - you might want to try a 512KB readahead - I've found that is optimal
>> for RAID1 on some RAID controllers.
>> 
>> - Luke
>> 
>> On 5/30/07 2:35 AM, "Albert Cervera Areny" <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>> after doing the "dd" tests for a server we have at work I obtained:
>>> Read: 47.20 Mb/s
>>> Write: 39.82 Mb/s
>>> Some days ago read performance was around 20Mb/s due to no readahead in
>>> md0 so I modified it using hdparm. However, it seems to me that being it
>>> a RAID1 read speed could be much better. These are SATA disks with 3Gb of
>>> RAM so I did 'time bash -c "dd if=/dev/zero of=bigfile bs=8k count=786432
>>> && sync"'. File system is ext3 (if read many times in the list that XFS
>>> is faster), but I don't want to change the file system right now.
>>> Modifing the readahead from the current 1024k to 2048k doesn't make any
>>> difference. Are there any other tweaks I can make?
>>> 
>>> 
>>> ---(end of broadcast)---
>>> TIP 4: Have you searched our list archives?
>>> 
>>>http://archives.postgresql.org
>> 
>> ---(end of broadcast)---
>> TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Luke Lonergan


> This is standard stuff, very well proven: try googling 'self healing zfs'.

The first hit on this search is a demo of ZFS detecting corruption of one of
the mirror pair using checksums, very cool:
  
http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508
D464883F194061E341F58F4E7E1

The bad drive is pointed out directly using the checksum and the data
integrity is preserved.

- Luke

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Luke Lonergan

It's created when the data is written to both drives.

This is standard stuff, very well proven: try googling 'self healing zfs'.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Michael Stone [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, May 30, 2007 11:11 AM Eastern Standard Time
To: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] setting up raid10 with more than 4 drives

On Wed, May 30, 2007 at 10:36:48AM -0400, Luke Lonergan wrote:
>> I don't see how that's better at all; in fact, it reduces to 
>> exactly the same problem: given two pieces of data which 
>> disagree, which is right?  
>
>The one that matches the checksum.

And you know the checksum is good, how?

Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Luke Lonergan

> I don't see how that's better at all; in fact, it reduces to 
> exactly the same problem: given two pieces of data which 
> disagree, which is right?  

The one that matches the checksum.

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Bad RAID1 read performance

2007-05-30 Thread Luke Lonergan

This sounds like a bad RAID controller - are you using a built-in hardware
RAID?  If so, you will likely want to use Linux software RAID instead.

Also - you might want to try a 512KB readahead - I've found that is optimal
for RAID1 on some RAID controllers.

- Luke 


On 5/30/07 2:35 AM, "Albert Cervera Areny" <[EMAIL PROTECTED]> wrote:

> Hi,
> after doing the "dd" tests for a server we have at work I obtained:
> Read: 47.20 Mb/s
> Write: 39.82 Mb/s
> Some days ago read performance was around 20Mb/s due to no readahead in md0
> so I modified it using hdparm. However, it seems to me that being it a RAID1
> read speed could be much better. These are SATA disks with 3Gb of RAM so I
> did 'time bash -c "dd if=/dev/zero of=bigfile bs=8k count=786432 && sync"'.
> File system is ext3 (if read many times in the list that XFS is faster), but
> I don't want to change the file system right now. Modifing the readahead from
> the current 1024k to 2048k doesn't make any difference. Are there any other
> tweaks I can make?
>  
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Luke Lonergan

Hi Peter,

On 5/30/07 12:29 AM, "Peter Childs" <[EMAIL PROTECTED]> wrote:

> Good point, also if you had Raid 1 with 3 drives with some bit errors at least
> you can take a vote on whats right. Where as if you only have 2 and they
> disagree how do you know which is right other than pick one and hope... But
> whatever it will be slower to keep in sync on a heavy write system.

Much better to get a RAID system that checksums blocks so that "good" is
known.  Solaris ZFS does that, as do high end systems from EMC and HDS.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-29 Thread Luke Lonergan

Stephen,

On 5/29/07 8:31 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:

> It's just more copies of the same data if it's really a RAID1, for the
> extra, extra paranoid.  Basically, in the example above, I'd read it as
> "D1, D2, D5 have identical data on them".

In that case, I'd say it's a waste of disk to add 1+2 redundancy to the
mirrors.

- Luke 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-29 Thread Luke Lonergan

Hi Rajesh,

On 5/29/07 7:18 PM, "Rajesh Kumar Mallah" <[EMAIL PROTECTED]> wrote:

> D1 raid1 D2 raid1 D5  --> MD0
> D3 raid1 D4 raid1 D6  --> MD1
> MD0 raid0 MD1  --> MDF (final)

AFAIK you can't RAID1 more than two drives, so the above doesn't make sense
to me.

- Luke

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-29 Thread Luke Lonergan

Stripe of mirrors is preferred to mirror of stripes for the best balance of
protection and performance.

In the stripe of mirrors you can lose up to half of the disks and still be
operational.  In the mirror of stripes, the most you could lose is two
drives.  The performance of the two should be similar - perhaps the seek
performance would be different for high concurrent use in PG.

- Luke

On 5/29/07 2:14 PM, "Rajesh Kumar Mallah" <[EMAIL PROTECTED]> wrote:

> hi,
> 
> this is not really postgresql specific, but any help is appreciated.
> i have read more spindles the better it is for IO performance.
> 
> suppose i have 8 drives , should a stripe (raid0) be created on
> 2 mirrors (raid1) of 4 drives each OR  should a stripe on 4 mirrors
> of 2 drives each be created  ?
> 
> also does single channel  or dual channel controllers makes lot
> of difference in raid10 performance ?
> 
> regds
> mallah.
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] New performance documentation released

2007-05-16 Thread Luke Lonergan

Cool!

Now we can point people to your faq instead of repeating the "dd" test
instructions.  Thanks for normalizing this out of the list :-)

- Luke


On 5/15/07 8:55 PM, "Greg Smith" <[EMAIL PROTECTED]> wrote:

> I've been taking notes on what people ask about on this list, mixed that
> up with work I've been doing lately, and wrote some documentation readers
> of this mailing list may find useful.  There are a series of articles now
> at http://www.westnet.com/~gsmith/content/postgresql/ about performance
> testing and tuning.
> 
> The "5-minute Introduction to PostgreSQL Performance" and the "Disk
> performance testing" articles were aimed to be FAQ-style pieces people
> asking questions here might be pointed toward.
> 
> All of the pieces in the "Advanced Topics" sections aren't finished to my
> standards yet, but may be useful anyway so I've posted what I've got so
> far.
> 
> --
> * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Throttling PostgreSQL's CPU usage

2007-05-08 Thread Luke Lonergan

You can use the workload management feature that we've contributed to
Bizgres.  That allows you to control the level of statement concurrency by
establishing queues and associating them with roles.

That would provide the control you are seeking.

- Luke


On 5/8/07 4:24 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> On Tue, 8 May 2007, Carlos Moreno wrote:
> 
>> Daniel Griscom wrote:
>>> 
>>>  Several people have mentioned having multiple processors; my current
>>>  machine is a uni-processor machine, but I believe we could spec the actual
>>>  runtime machine to have multiple processors/cores.
>> 
>> My estimate is that yes, you should definitely consider that.
>> 
>>>  I'm only running one query at a time; would that query be guaranteed to
>>>  confine itself to a single processor/core?
>> 
>> From what Joshua mentions, looks like you do have that guarantee.
> 
> isn't there a way to limit how many processes postgres will create?
> 
> if this is limited to 1, what happens when a vaccum run hits (or
> autovaccum)
> 
> David Lang
> 
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Best OS for Postgres 8.2

2007-05-08 Thread Luke Lonergan

WRT ZFS on Linux, if someone were to port it, the license issue would get 
worked out IMO (with some discussion to back me up).  From discussions with the 
developers, the biggest issue is a technical one: the Linux VFS layer makes the 
port difficult.

I don't hold any hope that the FUSE port will be a happy thing, the performance 
won't be there.

Any volunteers to port ZFS to Linux?

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster

2007-04-26 Thread Luke Lonergan

NUMERIC operations are very slow in pgsql.  Equality comparisons are somewhat 
faster, but other operations are very slow compared to other vendor's NUMERIC.

We've sped it up a lot here internally, but you may want to consider using 
FLOAT for what you are doing.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Bill Moran [mailto:[EMAIL PROTECTED]
Sent:   Thursday, April 26, 2007 05:13 PM Eastern Standard Time
To: zardozrocks
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] Simple query, 10 million records...MySQL ten 
times faster

In response to zardozrocks <[EMAIL PROTECTED]>:

> I have this table:
> 
> CREATE TABLE test_zip_assoc (
> id serial NOT NULL,
> f_id integer DEFAULT 0 NOT NULL,
> lat_radians numeric(6,5) DEFAULT 0.0 NOT NULL,
> long_radians numeric(6,5) DEFAULT 0.0 NOT NULL
> );
> CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians);
> CREATE INDEX long_radians ON test_zip_assoc USING btree
> (long_radians);
> 
> 
> 
> It's basically a table that associates some foreign_key (for an event,
> for instance) with a particular location using longitude and
> latitude.  I'm basically doing a simple proximity search.  I have
> populated the database with *10 million* records.  I then test
> performance by picking 50 zip codes at random and finding the records
> within 50 miles with a query like this:
> 
> SELECT id
>   FROM test_zip_assoc
>   WHERE
>   lat_radians > 0.69014816041
>   AND lat_radians < 0.71538026567
>   AND long_radians > -1.35446228028
>   AND long_radians < -1.32923017502
> 
> 
> On my development server (dual proc/dual core Opteron 2.8 Ghz with 4GB
> ram) this query averages 1.5 seconds each time it runs after a brief
> warmup period.  In PostGreSQL it averages about 15 seconds.
> 
> Both of those times are too slow.  I need the query to run in under a
> second with as many as a billion records.  I don't know if this is
> possible but I'm really hoping someone can help me restructure my
> indexes (multicolumn?, multiple indexes with a 'where' clause?) so
> that I can get this running as fast as possible.
> 
> If I need to consider some non-database data structure in RAM I will
> do that too.  Any help or tips would be greatly appreciated.  I'm
> willing to go to greath lengths to test this if someone can make a
> good suggestion that sounds like it has a reasonable chance of
> improving the speed of this search.  There's an extensive thread on my
> efforts already here:
> 
> http://phpbuilder.com/board/showthread.php?t=10331619&page=10

Why didn't you investigate/respond to the last posts there?  The advice
to bump shared_buffers is good advice.  work_mem might also need bumped.

Figure out which postgresql.conf your system is using and get it dialed
in for your hardware.  You can make all the indexes you want, but if
you've told Postgres that it only has 8M of RAM to work with, performance
is going to suck.  I don't see hardware specs on that thread (but I
didn't read the whole thing)  If the system you're using is a dedicated
DB system, set shared_buffers to 1/3 - 1/2 of the physical RAM on the
machine for starters.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] What`s wrong with JFS configuration?

2007-04-26 Thread Luke Lonergan

The outer track / inner track performance ratio is more like 40 percent.  
Recent example is 78MB/s outer and 44MB/s inner for the new Seagate 750MB drive 
(see http://www.storagereview.com for benchmark results)

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Jim Nasby [mailto:[EMAIL PROTECTED]
Sent:   Thursday, April 26, 2007 03:53 AM Eastern Standard Time
To: Pawel Gruszczynski
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] What`s wrong with JFS configuration?

On Apr 25, 2007, at 8:51 AM, Pawel Gruszczynski wrote:
> where u6 stores Fedora Core 6 operating system, and u0 stores 3  
> partitions with ext2, ext3 and jfs filesystem.

Keep in mind that drives have a faster data transfer rate at the  
outer-edge than they do at the inner edge, so if you've got all 3  
filesystems sitting on that array at the same time it's not a fair  
test. I heard numbers on the impact of this a *long* time ago and I  
think it was in the 10% range, but I could be remembering wrong.

You'll need to drop each filesystem and create the next one to get a  
fair comparison.
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Cache hit ratio

2007-04-03 Thread Luke Lonergan

Set log_executor_stats=true;

Then look in the log after running statements (or tail -f logfile).

- Luke

On 4/3/07 7:12 AM, "Jean Arnaud" <[EMAIL PROTECTED]> wrote:

> Hi
> 
> Is there a way to get the cache hit ratio in PostGreSQL ?
> 
> Cheers

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Luke Lonergan

Andreas,

On 3/22/07 4:48 AM, "Andreas Tille" <[EMAIL PROTECTED]> wrote:

> Well, to be honest I'm not really interested in the performance of
> count(*).  I was just discussing general performance issues on the
> phone line and when my colleague asked me about the size of the
> database he just wonderd why this takes so long for a job his
> MS-SQL server is much faster.  So in principle I was just asking
> a first question that is easy to ask.  Perhaps I come up with
> more difficult optimisation questions.

This may be the clue you needed - in Postgres SELECT COUNT(*) is an
approximate way to measure the speed of your disk setup (up to about
1,200MB/s).  Given that you are having performance problems, it may be that
your disk layout is either:
- slow by design
- malfunctioning

If this is the case, then any of your queries that require a full table scan
will be affected.

You should check your sequential disk performance using the following:

time bash -c "dd if=/dev/zero of=/your_file_system/bigfile bs=8k
count=(your_memory_size_in_KB*2/8) && sync"
time dd if=/your_file_system/bigfile of=/dev/null bs=8k

Report those times here and we can help you with it.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Luke Lonergan

Hi Merlin,

On 2/14/07 8:20 AM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote:

> I am curious what is your take on the maximum insert performance, in
> mb/sec of large bytea columns (toasted), and how much if any greenplum
> was able to advance this over the baseline.  I am asking on behalf of
> another interested party.  Interested in numbers broken down per core
> on 8 core quad system and also aggreate.

Our approach is to attach a segment to each core, so we scale INSERT
linearly on number of cores.  So the per core limit we live with is the
10-20MB/s observed here.  We'd like to improve that so that we get better
performance with smaller machines.

We have demonstrated insert performance of 670 MB/s, 2.4TB/hour for
non-toasted columns using 3 load machines against 120 cores.  This rate was
load machine limited.

WRT toasted bytea columns we haven't done any real benchmarking of those.
Do you have a canned benchmark we can run?

- Luke 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Benchmarking PGSQL?

2007-02-14 Thread Luke Lonergan

Here¹s one:

Insert performance is limited to about 10-12 MB/s no matter how fast the
underlying I/O hardware.  Bypassing the WAL (write ahead log) only boosts
this to perhaps 20 MB/s.  We¹ve found that the biggest time consumer in the
profile is the collection of routines that ³convert to datum².

You can perform the test using any dataset, you might consider using the
TPC-H benchmark kit with a data generator available at www.tpc.org.  Just
generate some data, load the schema, then perform some COPY statements,
INSERT INTO SELECT FROM and CREATE TABLE AS SELECT.

- Luke

On 2/14/07 2:00 AM, "Krishna Kumar" <[EMAIL PROTECTED]> wrote:

> Hello All, 
> I'm a performance engineer, quite interested in getting deep into the PGSQL
> performance enhancement effort. In that regard, I have the following questions
> : 
> 1. Is there a benchmarking setup, that I can access online?
> 2. What benchmarks are we running , for performance numbers?
> 3. What are the current issues, related to performance?
> 4. Where can I start, with the PGSQL performance effort?
> 
> Thanks a lot, 
> Krishna 
>

Re: [PERFORM] trouble with a join on OS X

2007-02-02 Thread Luke Lonergan

> \o /tmp/really_big_cursor_return
> 
> ;)

Tough crowd :-D

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] trouble with a join on OS X

2007-02-02 Thread Luke Lonergan

Tom,

On 2/2/07 2:18 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> as of 8.2 there's a psql variable
> FETCH_COUNT that can be set to make it happen behind the scenes.)

FETCH_COUNT is a godsend and works beautifully for exactly this purpose.

Now he's got to worry about how to page through 8GB of results in something
less than geological time with the space bar ;-)

- Luke



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan

Tom,

On 1/30/07 9:55 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Gregory Stark wrote:
>>> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on
>>> your data distribution. It's not hard to come up with distributions where
>>> it's
>>> 1000x as fast and others where there's no speed difference.)
> 
>> So the figure is really "1-1000x"?  I bet this one is more impressive in
>> PHB terms.
> 
> Luke has a bad habit of quoting numbers that are obviously derived from
> narrow benchmarking scenarios as Universal Truths, rather than providing
> the context they were derived in.  I wish he'd stop doing that...

In this case I was referring to results obtained using grouping and distinct
optimizations within sort where the benefit is from the use of a single pass
instead of the multiple merge passes for external sort followed by a UNIQUE
operator.  In this case, the benefit ranges from 2-5x in many examples as I
mentioned: "from 2-5 times faster than a typical external sort".  This is
also the same range of benefits we see for this optimization with a popular
commercial database. With the limit/sort optimization we have seen more
dramatic results, but I think those are less typical cases.

Here are some results for a 1GB table and a simple COUNT(DISTINCT) on a
column with 7 unique values from my dual CPU laptop running Greenplum DB (PG
8.2.1 compatible) on both CPUs. Note that my laptop has 2GB of RAM so I have
the 1GB table loaded into OS I/O cache.  The unmodified external sort spills
the sorted attribute to disk, but that takes little time.  Note that the
COUNT(DISTINCT) plan embeds a sort as the transition function in the
aggregation node.

=
= No Distinct Optimization in Sort ==
=
lukelonergan=# select count(distinct l_shipmode) from lineitem;
 count 
---
 7
(1 row)

Time: 37832.308 ms

lukelonergan=# explain analyze select count(distinct l_shipmode) from
lineitem;  
QUERY PLAN 



 Aggregate  (cost=159175.30..159175.31 rows=1 width=8)
   Total 1 rows with 40899 ms to end, start offset by 3.189 ms.
   ->  Gather Motion 2:1  (slice2)  (cost=159175.25..159175.28 rows=1
width=8)
 recv:  Total 2 rows with 39387 ms to first row, 40899 ms to end,
start offset by 3.191 ms.
 ->  Aggregate  (cost=159175.25..159175.26 rows=1 width=8)
   Avg 1.00 rows x 2 workers.  Max 1 rows (seg0) with 39367 ms
to end, start offset by 22 ms.
   ->  Redistribute Motion 2:2  (slice1)  (cost=0.00..151672.00
rows=3001300 width=8)
 recv:  Avg 3000607.50 rows x 2 workers.  Max 3429492
rows (seg1) with 0.362 ms to first row, 8643 ms to end, start offset by 23
ms.
 Hash Key: lineitem.l_shipmode
 ->  Seq Scan on lineitem  (cost=0.00..91646.00
rows=3001300 width=8)
   Avg 3000607.50 rows x 2 workers.  Max 3001300
rows (seg0) with 0.049 ms to first row, 2813 ms to end, start offset by
12.998 ms.
 Total runtime: 40903.321 ms
(12 rows)

Time: 40904.013 ms

=
= With Distinct Optimization in Sort ==
=
lukelonergan=# set mpp_sort_flags=1;
SET
Time: 1.425 ms
lukelonergan=# select count(distinct l_shipmode) from lineitem;
 count 
---
 7
(1 row)

Time: 12846.466 ms

lukelonergan=# explain analyze select count(distinct l_shipmode) from
lineitem;
   
QUERY PLAN 



 Aggregate  (cost=159175.30..159175.31 rows=1 width=8)
   Total 1 rows with 13754 ms to end, start offset by 2.998 ms.
   ->  Gather Motion 2:1  (slice2)  (cost=159175.25..159175.28 rows=1
width=8)
 recv:  Total 2 rows with 13754 ms to end, start offset by 3.000 ms.
 ->  Aggregate  (cost=159175.25..159175.26 rows=1 width=8)
   Avg 1.00 rows x 2 workers.  Max 1 rows (seg0) with 13734 ms
to end, start offset by 23 ms.
   ->  Redistribute Motion 2:2  (slice1)  (cost=0.00..151672.00
rows=3001300 width=8)
 recv:  Avg 3000607.50 rows x 2 workers.  Max 3429492
rows (seg1) with 0.352 ms to first row, 10145 ms to end, start offset by 26
ms.
 Hash Key: lineitem.l_shipmode
 ->  Seq Scan on lineitem  (cost=0.00..91646.00
rows=3001300 width=8)
   Avg 3000607.50 rows x 2 workers.  Max 3001300
rows (seg0) with 0.032 ms to first row

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan

Argh!

 ###   ###
#
##
 ### ###
   
   #
##
  ## ##
 #  #   
### #
##   #####
#   ##  ##
#  ###
## ###
###   ###   ##
 #   
  ##
   ##   ##
 #
   
   ##
   ##
   ##
   ##
   ##
   ##
   ##
   ##
   ##
   ##
   
######
 #
 
##
##
##



On 1/30/07 3:00 PM, "Josh Berkus"  wrote:

> 
> -- 
> --Josh
> 
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
> 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan

Alvaro,

On 1/30/07 9:04 AM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote:

>> (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on
>> your data distribution. It's not hard to come up with distributions where
>> it's
>> 1000x as fast and others where there's no speed difference.)
> 
> So the figure is really "1-1000x"?  I bet this one is more impressive in
> PHB terms.

You got me - I'll bite - what's PHB?

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan

Chad,

On 1/30/07 7:03 AM, "Chad Wagner" <[EMAIL PROTECTED]> wrote:

> On 1/30/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
>> Not that it helps Igor, but we've implemented single pass sort/unique,
>> grouping and limit optimizations and it speeds things up to a single seqscan
>> over the data, from 2-5 times faster than a typical external sort.
> 
> Was that integrated back into PostgreSQL, or is that part of Greenplum's
> offering? 

Not yet, we will submit to PostgreSQL along with other executor node
enhancements like hybrid hash agg (fixes the memory overflow problem with
hash agg) and some other great sort work.  These are all "cooked" and in the
Greenplum DBMS, and have proven themselves significant on customer workloads
with tens of terabytes already.

For now it seems that the "Group By" trick Brian suggested in this thread
combined with lots of work_mem may speed things up for this case if HashAgg
is chosen.  Watch out for misestimates of stats though - hash agg may
overallocate RAM in some cases.

>> I can't think of a way that indexing would help this situation given the
>> required visibility check of each tuple.
> 
> I agree, using indexes as a "skinny" table is a whole other feature that would
> be nice. 

Yah - I like Hannu's ideas to make visibility less of a problem.  We're
thinking about this too.

- Luke

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Luke Lonergan

Chad,

On 1/30/07 6:13 AM, "Chad Wagner" <[EMAIL PROTECTED]> wrote:

> Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash,
> I guess), there is no need to push 3M rows through a sort algorithm to only
> shave it down to 1848 unique records.
> 
> I am assuming this optimization just isn't implemented in PostgreSQL?

Not that it helps Igor, but we've implemented single pass sort/unique,
grouping and limit optimizations and it speeds things up to a single seqscan
over the data, from 2-5 times faster than a typical external sort.

I can't think of a way that indexing would help this situation given the
required visibility check of each tuple.

- Luke

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [pgsql-advocacy] Postgres and really huge tables

2007-01-18 Thread Luke Lonergan

Chris,

On 1/18/07 1:42 PM, "Chris Mair" <[EMAIL PROTECTED]> wrote:

> A lot of data, but not a lot of records... I don't know if that's
> valid. I guess the people at Greenplum and/or Sun have more exciting
> stories ;)

You guess correctly :-)

Given that we're Postgres 8.2, etc compatible, that might answer Brian's
coworker's question.  Soon we will be able to see that Greenplum/Postgres
are handling the world's largest databases both in record count and size.

While the parallel scaling technology we employ is closed source, we are
still contributing scaling technology to the community (partitioning, bitmap
index, sort improvements, resource management, more to come), so Postgres as
a "bet" is likely safer and better than a completely closed source
commercial product.

- Luke

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] max() versus order/limit (WAS: High update

2007-01-14 Thread Luke Lonergan

Adam,

This optimization would require teaching the planner to use an index for
MAX/MIN when available.  It seems like an OK thing to do to me.

- Luke

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich
> Sent: Sunday, January 14, 2007 8:52 PM
> To: 'Joshua D. Drake'; 'Tom Lane'
> Cc: 'Craig A. James'; 'PostgreSQL Performance'
> Subject: Re: [PERFORM] max() versus order/limit (WAS: High 
> update activity, PostgreSQL vs BigDBMS)
> 
> 
> Did anybody get a chance to look at this?  Is it expected behavior?
> Everyone seemed so incredulous, I hoped maybe this exposed a 
> bug that would be fixed in a near release.
> 
> 
> -Original Message-
> From: Adam Rich [mailto:[EMAIL PROTECTED]
> Sent: Sunday, January 07, 2007 11:53 PM
> To: 'Joshua D. Drake'; 'Tom Lane'
> Cc: 'Craig A. James'; 'PostgreSQL Performance'
> Subject: RE: [PERFORM] High update activity, PostgreSQL vs BigDBMS
> 
> 
> 
> Here's another, more drastic example... Here the order by / limit
> version
> runs in less than 1/7000 the time of the MAX() version.
> 
> 
> select max(item_id)
> from events e, receipts r, receipt_items ri
> where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
> 
> Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
> time=816.382..816.383 rows=1 loops=1)
>   ->  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
> time=155.177..675.870 rows=147383 loops=1)
> Hash Cond: (ri.receipt_id = r.receipt_id)
> ->  Seq Scan on receipt_items ri  (cost=0.00..4097.56
> rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
> ->  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
> time=155.146..155.146 rows=24571 loops=1)
>   ->  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
> (actual time=34.803..126.452 rows=24571 loops=1)
> Hash Cond: (r.event_id = e.event_id)
> ->  Seq Scan on receipts r  (cost=0.00..663.58
> rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
> ->  Hash  (cost=469.73..469.73 rows=14843 width=4)
> (actual time=34.780..34.780 rows=14843 loops=1)
>   ->  Seq Scan on events e  (cost=0.00..469.73
> rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
> Total runtime: 816.645 ms
> 
> select item_id
> from events e, receipts r, receipt_items ri
> where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
> order by item_id desc limit 1
> 
> 
> Limit  (cost=0.00..0.16 rows=1 width=4) (actual 
> time=0.047..0.048 rows=1
> loops=1)
>   ->  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
> time=0.044..0.044 rows=1 loops=1)
> ->  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
> (actual time=0.032..0.032 rows=1 loops=1)
>   ->  Index Scan Backward using receipt_items_pkey on
> receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
> time=0.016..0.016 rows=1 loops=1)
>   ->  Index Scan using receipts_pkey on receipts r
> (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
> loops=1)
> Index Cond: (r.receipt_id = ri.receipt_id)
> ->  Index Scan using events_pkey on events e  (cost=0.00..0.04
> rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
>   Index Cond: (e.event_id = r.event_id)
> Total runtime: 0.112 ms
> 
> 
> 
> 
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
> Drake
> Sent: Sunday, January 07, 2007 9:10 PM
> To: Adam Rich
> Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
> Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
> 
> 
> On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
> > I'm using 8.2 and using order by & limit is still faster than MAX()
> > even though MAX() now seems to rewrite to an almost identical plan
> > internally.
> 
> 
> Gonna need you to back that up :) Can we get an explain analyze?
> 
> 
> > Count(*) still seems to use a full table scan rather than an index
> scan.
> > 
> 
> There is a TODO out there to help this. Don't know if it will 
> get done.
> 
> Joshua D. Drake
> 
> -- 
> 
>   === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>  http://www.commandprompt.com/
> 
> Donate to the PostgreSQL Project: 
> http://www.postgresql.org/about/donate
> 
> 
> 
> 
> ---(end of 
> broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 
> 
> ---(end of 
> broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate
> 
> 


---

Re: [PERFORM] Large table performance

2007-01-14 Thread Luke Lonergan

Mark,

Note that selecting an index column means that Postgres fetches the whole
rows from disk.  I think your performance problem is either: 1) slow disk or
2) index access of distributed data.  If it¹s (1), there are plenty of
references from this list on how to check for that and fix it.  If it¹s (2),
see below. 

The performance of index accessed data in Postgres depends on how the data
is loaded into the DBMS.  If the records you are fetching are distributed
widely among the 3M records on disk, then the select is going to ³hop, skip
and jump² across the disk to get the records you need.  If those records are
stored more closely together, then the fetching from disk is going to be
sequential.  A good example of the best situation for an index is an index
on a date column when the data is loaded sequentially by date.  A query
against a specific date range will result in an ordered fetch from the disk,
which leverages fast sequential access.

The difference in performance between ordered and distributed access is
similar to the difference between ³random seek² and ³sequential² performance
of the disk subsystem.  The random seek performance of typical disk
subsystems with one thread (for one user in postgres) is 120 seeks per
second.  If your data was randomly distributed, you¹d expect about
10,000/120 = 83 seconds to gather these records.  Since you¹re getting 10
seconds, I expect that your data is lumped into groups and you are getting a
mix of sequential reads and seeks.

Note that adding more disks into a RAID does not help the random seek
performance within Postgres, but may linearly improve the ordered access
speed.  So even with 44 disks in a RAID10 pool on a Sun X4500, the seek
performance of Postgres (and other DBMS¹s without async or threaded I/O) is
that of a single disk  120 seeks per second.  Adding more users allows the
seeks to scale on such a machine as users are added, up to the number of
disks in the RAID.  But for your one user example  no help.

If your problem is (2), you can re-order the data on disk by using a CREATE
TABLE statement like this:
  CREATE TABLE fast_table AS SELECT * FROM slow_table ORDER BY teacher_id;
  CREATE INDEX teacher_id_ix ON fast_table;
  VACUUM ANALYZE fast_table;

You should then see ordered access when you do index scans on teacher_id.

- Luke

On 1/12/07 4:31 PM, "Mark Dobbrow" <[EMAIL PROTECTED]> wrote:

> Hello - 
> 
> I have a fairly large table (3 million records), and am fetching 10,000
> non-contigous records doing a simple select on an indexed column ie
> 
> select grades from large_table where teacher_id = X
> 
> This is a test database, so the number of records is always 10,000 and i have
> 300 different teacher ids.
> 
> The problem is, sometimes fetching un-cached records takes 0.5 secs and
> sometimes (more often) is takes more like 10.0 seconds
> 
> (fetching the same records for a given teacher_id a second time takes about
> 0.25 secs)
> 
> Has anyone seen similar behavior or know what the solution might be?
> 
> any help much appreciated,
> Mark
> 
> 
> 
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
> 
> 
>

Re: [PERFORM] Large table performance

2007-01-14 Thread Luke Lonergan

Mark,

This behavior likely depends on how the data is loaded into the DBMS.  If
the records you are fetching are distributed widely among the 3M records on
disk, then 


On 1/12/07 4:31 PM, "Mark Dobbrow" <[EMAIL PROTECTED]> wrote:

> Hello - 
> 
> I have a fairly large table (3 million records), and am fetching 10,000
> non-contigous records doing a simple select on an indexed column ie
> 
> select grades from large_table where teacher_id = X
> 
> This is a test database, so the number of records is always 10,000 and i have
> 300 different teacher ids.
> 
> The problem is, sometimes fetching un-cached records takes 0.5 secs and
> sometimes (more often) is takes more like 10.0 seconds
> 
> (fetching the same records for a given teacher_id a second time takes about
> 0.25 secs)
> 
> Has anyone seen similar behavior or know what the solution might be?
> 
> any help much appreciated,
> Mark
> 
> 
> 
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
> 
> 
>

Re: [PERFORM] table partioning performance

2007-01-08 Thread Luke Lonergan

Colin,


On 1/6/07 8:37 PM, "Colin Taylor" <[EMAIL PROTECTED]> wrote:

> Hi there,  we've partioned a table (using 8.2) by day due to the 50TB of data
> (500k row size, 100G rows) we expect to store it in a year.
> Our performance on inserts and selects against the master table is
> disappointing, 10x slower (with ony 1 partition constraint) than we get  by
> going to the partioned table directly. Browsing the list I get the impression
> this just a case of too many partitions?  would be better off doing partitions
> of partitions ? 

Can you post an "explain analyze" of your query here so we can see what's
going on?

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Scaling concerns

2006-12-16 Thread Luke Lonergan

Tsuraan,

"Select count(*) from bigtable" is testing your disk drive speed up till
about 300MB/s, after which it is CPU limited in Postgres.

My guess is that your system has a very slow I/O configuration, either due
to faulty driver/hardware or the configuration.

The first thing you should do is run a simple I/O test on your data
directory - write a file twice the size of memory using dd like this:

  time bash -c "dd if=/dev/zero of=data_directory/bigfile bs=8k count=(2 *
memory_size / 8192) && sync"

  time dd if=data_directory/bigfile of=/dev/null bs=8k

Then report the times here.

- Luke

On 12/16/06 9:26 AM, "tsuraan" <[EMAIL PROTECTED]> wrote:

> I'm writing a webmail-type application that is meant to be used in a
> corporate environment.  The core of my system is a Postgres database
> that is used as a message header cache.  The two (relevant) tables
> being used are pasted into the end of this message.  My problem is
> that, as the messages table increases to tens of millions of rows,
> pgsql slows down considerably.  Even an operation like "select
> count(*) from messages" can take minutes, with a totally idle system.
> Postgres seems to be the most scalable Free database out there, so I
> must be doing something wrong.
> 
> As for the most common strategy of having a slower (more rows)
> "archival" database and a smaller, faster "live" database, all the
> clients in the company are using their normal corporate email server
> for day-to-day email handling.  The webmail is used for access email
> that's no longer on the corporate server, so it's not really simple to
> say which emails should be considered live and which are really
> out-of-date.
> 
> My postgres settings are entirely default with the exception of
> shared_buffers being set to 40,000 and max_connections set to 400.
> I'm not sure what the meaning of most of the other settings are, so I
> haven't touched them.  The machines running the database servers are
> my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
> SATA II drive), and a production server with two dual-core Intel
> chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
> controller.  Both machines are running Gentoo Linux with a 2.6.1x
> kernel, and both exhibit significant performance degradation when I
> start getting tens of millions of records.
> 
> Any advice would be most appreciated.  Thanks in advance!
> 
> Tables:
> 
> CREATE TABLE EmailAddresses (
>   emailid   SERIAL PRIMARY KEY, --  The unique identifier of this address
>   name  TEXT NOT NULL,  --  The friendly name in the address
>   addrspec  TEXT NOT NULL,  --  The [EMAIL PROTECTED] part of the 
> address
>   UNIQUE(name, addrspec)
> );
> 
> and
> 
> CREATE TABLE Messages (
>   -- Store info:
>   msgkeyBIGSERIAL PRIMARY KEY,  --  Unique identifier for a message
>   path  TEXT NOT NULL,  --  Where the message is on the file
> system
>   inserted  TIMESTAMP DEFAULT now(),--  When the message was fetched
>   -- Message Info:
>   msgid TEXT UNIQUE NOT NULL,   --  Message's Message-Id field
>   mfrom INTEGER --  Who sent the message
> REFERENCES EmailAddresses
> DEFAULT NULL,
>   mdate TIMESTAMP DEFAULT NULL, --  Message "date" header field
>   replyto   TEXT DEFAULT NULL,  --  Message-ID of replied-to message
>   subject   TEXT DEFAULT NULL,  --  Message "subject" header field
>   numatch   INTEGER DEFAULT NULL,   --  Number of attachments
>   UNIQUE(path)
> );
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Luke Lonergan

Daniel,

Good stuff.

Can you try this with just "-O3" versus "-O2"?

- Luke


On 12/11/06 2:22 PM, "Daniel van Ham Colchete" <[EMAIL PROTECTED]>
wrote:

> Hi yall,
> 
> I made some preliminary tests.
> 
> Before the results, I would like to make some acknowledgments:
> 1 - I didn't show any prove to any of the things I said until now.
> 2 - It really is a waste of everyone's time to say one thing when I
> can't prove it.
> 
> But all I said, is the knowledge I have been acumulating over the past
> few years working on a project where optimization is important. After
> algorithmic optimizations, compiler options is the second on my list
> and with my software they show measurable improvement. With the other
> software I use, they seen to run faster, but I didn't measure it.
> 
> TEST PROCEDURE
> 
> I ran this test at a Gentoo test machine I have here. It's a Pentium 4
> 3.0GHz (I don't know witch P4) with 1 GB of RAM memory. It only uses
> SATA drives. I didn't changed my glibc (or any other lib) during the
> test. I used GCC 3.4.6.
> 
> I ran each test three times. So we can get an idea about average
> values and standard deviation.
> 
> Each time I ran the test with the following commands:
> dropdb mydb
> createdb mydb
> pgbench -i -s 10 mydb 2> /dev/null
> psql -c 'vacuum analyze' mydb
> psql -c 'checkpoint' mydb
> sync
> pgbench -v -n -t 600 -c 5 mydb
> 
> My postgresql.conf was the default one, except for:
> fsync = 
> shared_buffers = 1
> work_mem = 10240
> 
> Every test results should begin the above, but I removed it because
> it's always the same:
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 5
> number of transactions per client: 600
> number of transactions actually processed: 3000/3000
> 
> TESTS RESULTS
> ==
> TEST 01: CFLAGS="-O2 -march=i686" fsync=false
> 
> tps = 734.948620 (including connections establishing)
> tps = 736.866642 (excluding connections establishing)
> 
> tps = 713.225047 (including connections establishing)
> tps = 715.039059 (excluding connections establishing)
> 
> tps = 721.769259 (including connections establishing)
> tps = 723.631065 (excluding connections establishing)
> 
> 
> TEST 02: CFLAGS="-O2 -march=i686" fsync=true
> 
> tps = 75.466058 (including connections establishing)
> tps = 75.485675 (excluding connections establishing)
> 
> tps = 75.115797 (including connections establishing)
> tps = 75.135311 (excluding connections establishing)
> 
> tps = 73.883113 (including connections establishing)
> tps = 73.901997 (excluding connections establishing)
> 
> 
> TEST 03: CFLAGS="-O2 -march=pentium4" fsync=false
> 
> tps = 846.337784 (including connections establishing)
> tps = 849.067017 (excluding connections establishing)
> 
> tps = 829.476269 (including connections establishing)
> tps = 832.008129 (excluding connections establishing)
> 
> tps = 831.416457 (including connections establishing)
> tps = 835.31 (excluding connections establishing)
> 
> 
> TEST 04 CFLAGS="-O2 -march=pentium4" fsync=true
> 
> tps = 83.224016 (including connections establishing)
> tps = 83.248157 (excluding connections establishing)
> 
> tps = 80.811892 (including connections establishing)
> tps = 80.834525 (excluding connections establishing)
> 
> tps = 80.671406 (including connections establishing)
> tps = 80.693975 (excluding connections establishing)
> 
> 
> CONCLUSIONS
> Everyone can get their own conclusion. Mine is:
> 
> 1 - You have improvement when you compile your postgresql using
> processor specific tecnologies. With the fsync the you have an
> improvement of 9% at the tps rate. Without the fsync, the improvement
> is of 15,6%.
> 
> 2 - You can still improve your indexes, sqls and everythingelse, this
> only adds another possible improvment.
> 
> 3 - I can't prove this but I *think* that this is related to the fact
> that GCC knows how to do the same thing better on each processor.
> 
> 4 - I'm still using source-based distros.
> 
> WHAT NOW
> There are other things I wish to test:
>  1 - What efect an optimized glibc has on PostgreSQL?
>  2 - How much improvement can I get playing with my postgresql.conf.
>  3 - What efect optimizations have with concurrency?
>  4 - What if I used Intel C++ Compiler instead of GCC?
>  5 - What if I use GCC 4.1.1 instead of GCC 3.4.6?
> 
> I'm thinking about writing a script to make all the tests (more than 3
> times each), get the data and plot some graphs.
> 
> I don't have the time right now to do it, maybe next week I'll have.
> 
> I invite everyone to comment/sugest on the procedure or the results.
> 
> Best regards,
> Daniel Colchete
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message

Re: [PERFORM] Looking for hw suggestions for high concurrency

2006-12-11 Thread Luke Lonergan

Merlin,

On 12/11/06 12:19 PM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote:

> ...and this 6 of them (wow!). the v40z was top of its class.  Will K8L
> run on this server?

No official word yet.

The X4600 slipped in there quietly under the X4500 (Thumper) announcement,
but it's a pretty awesome server.  It's been in production for
supercomputing in Japan for a long while now, so I'd trust it.

- Luke

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Luke Lonergan

Michael,

On 12/11/06 10:57 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote:

> That's kinda the opposite of what I meant by general code. I was trying
> (perhaps poorly) to distinguish between scientific codes and other
> stuff (especially I/O or human interface code).

Yes - choice of language has often been a differentiator in these markets -
LISP versus FORTRAN, C++ versus SQL/DBMS.  This isn't just about science,
it's also in Business Intelligence - e.g. Special purpose datamining code
versus algorithms expressed inside a data management engine.
  
> It also sounds like code specifically written to take advantage of
> compiler techniques, rather than random code thrown at a pile of cflags.
> I don't disagree that it is possible to get performance improvements if
> code is written to be performant code; I do (and did) disagree with the
> idea that you'll get huge performance improvements by taking regular old
> C application code and playing with compiler flags.

Agreed - that's my point exactly.
 
> IMO that's appropriate for some science codes (although I think even
> that sector is beginning to find that they've gone too far in a lot of
> ways), but for a database I'd rather have people debugging clean, readable
> code than risking my data to something incomprehensible that runs in
> optimal time.

Certainly something of a compromise is needed.
 
>> Column databases like C-Store remove these abstractions at planner time to
>
> gcc --make-it-really-fast-by-rewriting-it-from-the-ground-up?

Maybe not from ground->up, but rather from about 10,000 ft -> 25,000 ft?

There are some who have done a lot of work studying the impact of more
efficient DBMS, see here:
  http://homepages.cwi.nl/~boncz/x100.html

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Looking for hw suggestions for high concurrency

2006-12-11 Thread Luke Lonergan

The Sun X4600 is very good for this, the V40z is actually EOL so I'd stay
away from it.

You can currently do 8 dual core CPUs with the X4600 and 128GB of RAM and
soon you should be able to do 8 quad core CPUs and 256GB of RAM.

- Luke 


On 12/11/06 8:26 AM, "Cosimo Streppone" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> I'd like to get suggestions from all you out there for
> a new Postgresql server that will replace an existing one.
> 
> My performance analysis shows very *low* iowaits,
> and very high loads at times of peak system activity.
> The average concurrent processes number is 3/4, with peaks of 10/15.
> *Sustained* system load varies from 1.5 to 4, while peak load
> reaches 20 and above, always with low iowait%.
> I see this as a clear sign of more processors power need.
> 
> I'm aware of the context-switching storm problem, but here
> the cs stays well under 50,000, so I think it's not the problem
> here.
> 
> Current machine is an Acer Altos R700 (2 Xeon 2.8 Ghz, 4 Gb RAM),
> similar to this one (R710):
> http://www.acer.co.uk/acereuro/page9.do?sp=page4&dau34.oid=7036&UserCtxParam=0
> &GroupCtxParam=0&dctx1=17&CountryISOCtxParam=UK&LanguageISOCtxParam=en&ctx3=-1
> &ctx4=United+Kingdom&crc=334044639
> 
> So, I'm looking for advice on a mid-range OLTP server with
> the following (flexible) requirements:
> 
> - 4 physical processors expandable to 8 (dual core preferred),
>either Intel or AMD
> - 8 Gb RAM exp. to at least 32 Gb
> - Compatibility with Acer S300 External storage enclosure.
>The "old" server uses that, and it is boxed with 15krpm hdds
>in RAID-10, which do perfectly well their job.
>The enclosure is connected via 2 x LSI Logic PCI U320 controllers
> - On-site, same-day hardware support contract
> - Rack mount
> 
> Machines we're evaluating currently include:
> - Acer Altos R910
> - Sun Fire V40Z,
> - HP Integrity RX4640
> - IBM eServer 460
> 
> Experiences on these machines?
> Other suggestions?
> 
> Thanks.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Luke Lonergan

Michael,

On 12/11/06 9:31 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote:

> [1] I will say that I have never seen a realistic benchmark of general
> code where the compiler flags made a statistically significant
> difference in the runtime.

Here's one - I wrote a general purpose Computational Fluid Dynamics analysis
method used by hundreds of people to perform aircraft and propulsion systems
analysis.  Compiler flag tuning would speed it up by factors of 2-3 or even
more on some architectures.  The reason it was so effective is that the
structure of the code was designed to be general, but also to expose the
critical performance sections in a way that the compilers could use - deep
pipelining/vectorization, unrolling, etc, were carefully made easy for the
compilers to exploit in critical sections.  Yes, this made the code in those
sections harder to read, but it was a common practice because it might take
weeks of runtime to get an answer and performance mattered.

The problem I see with general purpose DBMS code the way it's structured in
pgsql (and others) is that many of the critical performance sections are
embedded in abstract interfaces that obscure them from optimization.  An
example is doing a simple "is equal to" operation has many layers
surrounding it to ensure that UDFs can be declared and that special
comparison semantics can be accomodated.  But if you're simply performing a
large number of INT vs. INT comparisons, it will be thousands of times
slower than a CPU native operation because of the function call overhead,
etc.  I've seen presentations that show IPC of Postgres at about 0.5, versus
the 2-4 possible from the CPU.

Column databases like C-Store remove these abstractions at planner time to
expose native operations in large chunks to the compiler and the IPC
reflects that - typically 1+ and as high as 2.5.  If we were to redesign the
executor and planner to emulate that same structure we could achieve similar
speedups and the compiler would matter more.

- Luke

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] File Systems Compared

2006-12-06 Thread Luke Lonergan

Brian,

On 12/6/06 8:40 AM, "Brian Hurt" <[EMAIL PROTECTED]> wrote:

> But actually looking things up, I see that PCI-Express has a theoretical 8
> Gbit/sec, or about 800Mbyte/sec. It's PCI-X that's 533 MByte/sec.  So there's
> still some headroom available there.

See here for the official specifications of both:
  http://www.pcisig.com/specifications/pcix_20/

Note that PCI-X version 1.0 at 133MHz runs at 1GB/s.  It's a parallel bus,
64 bits wide (8 bytes) and runs at 133MHz, so 8 x 133 ~= 1 gigabyte/second.

PCI Express with 16 lanes (PCIe x16) can transfer data at 4GB/s.  The Arecas
use (PCIe x8, see here:
http://www.areca.com.tw/products/html/pcie-sata.htm), so they can do 2GB/s.

- Luke 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] File Systems Compared

2006-12-06 Thread Luke Lonergan

Brian,

On 12/6/06 8:02 AM, "Brian Hurt" <[EMAIL PROTECTED]> wrote:

> These numbers are close enough to bus-saturation rates

PCIX is 1GB/s + and the memory architecture is 20GB/s+, though each CPU is
likely to obtain only 2-3GB/s.

We routinely achieve 1GB/s I/O rate on two 3Ware adapters and 2GB/s on the
Sun X4500 with ZFS.

> advise new people setting up systems to go this route over spending
> money on some fancy storage area network solution

People buy SANs for interesting reasons, some of them having to do with the
manageability features of high end SANs.  I've heard it said in those cases
that "performance doesn't matter much".

As you suggest, database replication provides one of those features, and
Solaris ZFS has many of the data management features found in high end SANs.
Perhaps we can get the best of both?

In the end, I think SAN vs. server storage is a religious battle.

- Luke

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Performance of ORDER BY

2006-12-05 Thread Luke Lonergan

Glenn,

On 12/5/06 9:12 AM, "Glenn Sullivan" <[EMAIL PROTECTED]> wrote:

> I am wanting some ideas about improving the performance of ORDER BY in
> our use.  I have a DB on the order of 500,000 rows and 50 columns.
> The results are always sorted with ORDER BY.  Sometimes, the users end up
> with a search that matches most of the rows.  In that case, I have a
> LIMIT 5000 to keep the returned results under control.  However, the
> sorting seems to take 10-60 sec.  If I do the same search without the
> ORDER BY, it takes about a second.
> 
> I am currently on version 8.0.1 on Windows XP using a Dell Optiplex 280
> with 1Gb of ram.  I have set sort_mem=10 set.
> 
> Any ideas?

Upgrade to 8.1 or 8.2, there were very large performance improvements to the
sort code made for 8.1+.  Also, when you've upgraded, you can experiment
with increasing work_mem to get better performance.  At some value of
work_mem (probably > 32MB) you will reach a plateau of performance, probably
4-5 times faster than what you see with 8.0.

- Luke 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Fw: [GENERAL] Including unique users in huge data

2006-11-29 Thread Luke Lonergan

Mark,

This fits the typical pattern of the "Big Honking Datamart" for clickstream
analysis, a usage pattern that stresses the capability of all DBMS.  Large
companies spend $1M + on combinations of SW and HW to solve this problem,
and only the large scale parallel DBMS can handle the load.  Players in the
market include Oracle, IBM, Teradata, Netezza and of course Greenplum.

Unfortunately, techniques like bitmap indexes only improve things by factors
O(10).  Parallelism is the only proven answer to get O(10,000) improvements
in response time. Furthermore, simply speeding the I/O underneath one CPU
per query is insufficient, the query and loading engine need to scale CPU
and storage access together.

=== start description of commercial Postgres solution ===
=== If commercial solutions offend, skip this section ===

The parallel I/O and CPU of Greenplum DB (formerly Bizgres MPP) is designed
for exactly this workload, where a combination of scalable I/O and CPU is
required to speed these kinds of queries (sessionizing weblogs, creating
aggregates, direct ad-hoc analysis).

One of our customers doing clickstream analysis uses a combination of
sessionizing ELT processing with Greenplum DB + Bizgres KETL and
Microstrategy for the reporting frontend.  The complete system is 1/100 the
price of systems that are slower.

We routinely see speedups of over 100 compared to large scale multi-million
dollar commercial solutions and have reference customers who are regularly
working with Terabytes of data.

 end commercial solution description 

- Luke  

On 11/29/06 11:43 AM, "Mark Jensen" <[EMAIL PROTECTED]> wrote:

> posting this here instead of the GENERAL list...richard is right, this is more
> of a performance question than a general question.
> 
> thanks,
>  
> 
> Mark Jensen
> 
> - Forwarded Message 
> From: Mark Jensen <[EMAIL PROTECTED]>
> To: Richard Huxton 
> Cc: pgsql-general@postgresql.org
> Sent: Wednesday, November 29, 2006 2:40:58 PM
> Subject: Re: [GENERAL] Including unique users in huge data warehouse in
> Postgresql...
> 
> thanks Richard.  I've talking to Ron Mayer about this as well offline.  I
> think the main problem is dedupping users, and not being able to aggregate
> visits in the fact table.  that's where most of the query time takes place.
> but the business guys just won't accept using visits, but not actual uniques
> dedupped.  if visits was in the fact table as an integer i could sum up, i'd
> be fine.  Ron also said he's put the unique user ids into arrays so it's
> faster to count them, but placing them into aggregate tables.  only problem is
> i'd still have to know what things to aggregate by to create these, which is
> impossible since we have so many dimensions and facts that are going to be
> ad-hoc.  i have about 20 summary tables i create per day, but most of the
> time, they have something new they want to query that's not in summary.  and
> will never come up again.
> 
> I tried installing Bizgres using their Bizgres loader and custom postgresql
> package with bitmap indexes, but doesn't seem to increase performance "that"
> much.  or as much as i would like compared to the normal postgresql install.
> loads are pretty slow when using their bitmap indexes compared to just using
> btree indexes in the standard postgresql install.  Query time is pretty good,
> but i also have to make sure load times are acceptable as well.  and had some
> problems with the bizgres loader losing connection to the database for no
> reason at all, but when using the normal copy command in 8.2RC1, works fine.
> love the new query inclusion in the copy command by the way, makes it so easy
> to aggregrate hourly fact tables into daily/weekly/monthly in one shot :)
> 
> and yes, work_mem is optimized as much as possible.  postgresql is using about
> 1.5 gigs of working memory when it runs these queries.  looking into getting
> 64 bit hardware with 16-32 gigs of RAM so i can throw most of this into memory
> to speed it up.  we're also using 3par storage which is pretty fast.  we're
> going to try and put postgresql on a local disk array using RAID 5 as well to
> see if it makes a difference.
> 
> and yes, right now, these are daily aggregate tables summed up from the
> hourly.  so about 17 million rows per day.  hourly fact tables are impossible
> to query right now, so i have to at least put these into daily fact tables.
> so when you have 30 days in this database, then yes, table scans are going to
> be huge, thus why it's taking so long, plus dedupping on unique user id :)
> 
> and you're right, i should put this on the performance mailing list... see you
> there :)
> 
> thanks guys.
>  
> 
> Mark Jensen
> 
> - Original Message 
> From: Richard Huxton 
> To: Mark Jensen <[EMAIL PROTECTED]>
> Cc: pgsql-general@postgresql.org
> Sent: Wednesday, Novem

Re: [PERFORM] TPC-H Benchmark

2006-11-24 Thread Luke Lonergan

http://www.tpc.org/tpch/spec/tpch_20060831.tar.gz

- Luke

On 11/24/06 8:47 AM, "Felipe Rondon Rocha" <[EMAIL PROTECTED]> wrote:

> Hi everyone,
>  
> does anyone have the TPC-H benchmark for PostgreSQL? Can you tell me where can
> i find the database and queries?
>  
> Thks,
> Felipe
>

Re: [PERFORM] availability of SATA vendors

2006-11-22 Thread Luke Lonergan

Arjen,

As usual, your articles are excellent!

Your results show again that the 3Ware 9550SX is really poor at random I/O
with RAID5 and all of the Arecas are really good.  3Ware/AMCC have designed
the 96xx to do much better for RAID5, but I've not seen results - can you
get a card and test it?

We now run the 3Ware controllers in RAID10 with 8 disks each and they have
been excellent.  Here (on your site) are results that bear this out:
  http://tweakers.net/reviews/639/9

- Luke


On 11/22/06 11:07 AM, "Arjen van der Meijden" <[EMAIL PROTECTED]>
wrote:

> Jeff,
> 
> You can find some (Dutch) results here on our website:
> http://tweakers.net/reviews/647/5
> 
> You'll find the AMCC/3ware 9550SX-12 with up to 12 disks, Areca 1280 and
> 1160 with up to 14 disks and a Promise and LSI sata-raid controller with
> each up to 8 disks. Btw, that Dell Perc5 (sas) is afaik not the same
> card as the LSI MegaRAID SATA 300-8X, but I have no idea whether they
> share the same controllerchip.
> In most of the graphs you also see a Areca 1160 with 1GB in stead of its
> default 256MB. Hover over the labels to see only that specific line,
> that makes the graphs quite readable.
> 
> You'll also see a Dell Perc5/e in the results, but that was done using
> Fujitsu SAS 15k rpm drives, not the WD Raptor 10k rpm's
> 
> If you dive deeper in our (still Dutch) "benchmark database" you may
> find some results of several disk-configurations on several controllers
> in various storage related tests, like here:
> http://tweakers.net/benchdb/test/193
> 
> If you want to filter some results, look for "Resultaatfilter &
> tabelgenerator" and press on the "Toon filteropties"-tekst. I think
> you'll be able to understand the selection-overview there, even if you
> don't understand Dutch ;)
> "Filter resultaten" below means the same as in English (filter [the]
> results)
> 
> Best regards,
> 
> Arjen
> 
> On 22-11-2006 17:36 Jeff Frost wrote:
>> On Wed, 22 Nov 2006, Bucky Jordan wrote:
>> 
>>> Dells (at least the 1950 and 2950) come with the Perc5, which is
>>> basically just the LSI MegaRAID. The units I have come with a 256MB BBU,
>>> I'm not sure if it's upgradeable, but it looks like a standard DIMM in
>>> there...
>>> 
>>> I posted some dd and bonnie++ benchmarks of a 6-disk setup a while back
>>> on a 2950, so you might search the archive for those numbers if you're
>>> interested- you should be able to get the same or better from a
>>> similarly equipped LSI setup. I don't recall if I posted pgbench
>>> numbers, but I can if that's of interest.
>> 
>> I could only find the 6 disk RAID5 numbers in the archives that were run
>> with bonnie++1.03.  Have you run the RAID10 tests since?  Did you settle
>> on 6 disk RAID5 or 2xRAID1 + 4XRAID10?
>> 
> 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] availability of SATA vendors

2006-11-17 Thread Luke Lonergan

Jeff,

On 11/17/06 11:45 AM, "Jeff Frost" <[EMAIL PROTECTED]> wrote:

> I see many of you folks singing the praises of the Areca and 3ware SATA
> controllers, but I've been trying to price some systems and am having trouble
> finding a vendor who ships these controllers with their systems.  Are you
> rolling your own white boxes or am I just looking in the wrong places?
> 
> Currently, I'm looking at Penguin, HP and Sun (though Sun's store isn't
> working for me at the moment).  Maybe I just need to order a Penguin and then
> buy the controller separately, but was hoping to get support from a single
> entity.

Rackable or Asacomputers sell and support systems with the 3Ware or Areca
controllers.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Easy read-heavy benchmark kicking around?

2006-11-08 Thread Luke Lonergan

Similar experiences with HP and their SmartArray 5i controller on Linux.
The answer was: "this controller has won awards for performance! It can't be
slow!", so we made them test it in their own labs an prove just how awfully
slow it was.  In the case of the 5i, it became apparent that HP had no
internal expertise on Linux and their controllers, the driver was built by a
third party that they didn't support and their performance people didn't
deal with the 5i at all.

In the end, all manner of benchmarks after you've purchased aren't a good
substitute for the up front question: do you have documentation of the
performance of your RAID controller on [Linux, Solaris, ...]?

I would like everyone who purchases IBM, Dell, HP or Sun to demand that
documentation - then perhaps we'd see higher quality drivers and hardware
result.

- Luke

On 11/8/06 8:34 AM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote:

> On 11/8/06, Markus Schaber <[EMAIL PROTECTED]> wrote:
>> Hi, Brian,
>> 
>> Brian Hurt wrote:
>> 
>>> So the question is: is there an easy to install and run, read-heavy
>>> benchmark out there that I can wave at them to get them to fix the
>>> problem?
>> 
>> For sequential read performance, use dd. Most variants of dd I've seen
>> output some timing information, and if not, do a "time dd
>> if=/your/device of=/dev/null bs=1M" on the partition.
> 
> we had a similar problem with a hitachi san, the ams200.  Their
> performance group refused to admit the fact that 50mb/sec dd test was
> a valid performance benchmark and needed to be addressed.  Yes, that
> was a HITACHI SAN, the AMS200, which hitachi's performance group
> claimed was 'acceptable performance'.  This was the advice we got
> after swapping out all the hardware and buying an entitlement to
> redhat enterprise which we had to do to get them to talk to us.
> 
> oh, the unit also lost a controller after about a week of
> operation...the unit being a HITACHI SAN, the AMS200.
> 
> any questions?
> 
> merlin
> 
> p.s. we have had good experiences with the adtx.
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Easy read-heavy benchmark kicking around?

2006-11-06 Thread Luke Lonergan

Select count(*) from table-twice-size-of-ram

Divide the query time by the number of pages in the table times the pagesize 
(normally 8KB) and you have your net disk rate.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Brian Hurt [mailto:[EMAIL PROTECTED]
Sent:   Monday, November 06, 2006 03:49 PM Eastern Standard Time
To: pgsql-performance@postgresql.org
Subject:[PERFORM] Easy read-heavy benchmark kicking around?

I'm having a spot of problem with out storage device vendor.  Read 
performance (as measured by both bonnie++ and hdparm -t) is abysmal 
(~14Mbyte/sec), and we're trying to get them to fix it.  Unfortunately, 
they're using the fact that bonnie++ is an open source benchmark to 
weasle out of doing anything- they can't fix it unless I can show an 
impact in Postgresql.

So the question is: is there an easy to install and run, read-heavy 
benchmark out there that I can wave at them to get them to fix the 
problem?  I have a second database running on a single SATA drive, so I 
can use that as a comparison point- "look, we're getting 1/3rd the read 
speed of a single SATA drive- this sucks!"

Any advice?

Brian


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Help w/speeding up range queries?

2006-10-31 Thread Luke Lonergan

John,

On 10/31/06 8:29 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

>> 'chrX' and StartPosition > 1000500 and EndPosition < 200;
> 
> Also, there's the PostGIS stuff, though it might be overkill for what
> you want.

Oops - I missed the point earlier.  Start and End are separate attributes so
this is like an unbounded window in a Start,End space.  PostGis provides
quadtree indexing would provide a terse TID list but you still have the
problem of how to ensure that the heap tuples being scanned are efficiently
retrieved, which would only happen if they are grouped similarly to the
retrieval pattern, right?

- Luke

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

1 2 3 4 >

1 - 100 of 347 matches

Mail list logo