Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-27 Thread David Boreham


On 5/27/2014 9:09 AM, Shaun Thomas wrote:

On 05/27/2014 10:00 AM, Albe Laurenz wrote:


I know that Oracle recommends it - they even built an NFS client
into their database server to make the most of it.


That's odd. Every time the subject of NFS comes up, it's almost 
immediately shot down with explicit advice to Never Do That(tm). It 
can be kinda safe-ish if mounted in sync mode with caching disabled, 
but I'd never use it on any of our systems.


It has been a long time since I was in the weeds of this issue, but the 
crux is that it was (still is?) hard to be sure that the filesystem's 
behavior was exactly as expected. My recollection of the Oracle story 
was that they had to verify the end-to-end behavior, and essentially 
certify its correctness to guarantee database acid. So you needed to be 
running a very specific version of the NFS code, configured in a very 
specific way. This isn't entirely inconsistent with the reference above 
that they built an NFS client. That's something you might need to do 
in order to be sure it behaves in the way you expect. Possibly the NFS 
implementations deployed today are more consistent and correct than was 
the case in the past. I wouldn't use a network filesystem for any kind 
of database storage myself though.







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Profiling PostgreSQL

2014-05-22 Thread David Boreham

On 5/22/2014 7:27 AM, Dimitris Karampinas wrote:
Is there any way to get the call stack of a function when profiling 
PostgreSQL with perf ?
I configured with --enable-debug, I run a benchmark against the system 
and I'm able to identify a bottleneck.
40% of the time is spent on an spinlock yet I cannot find out the 
codepath that gets me there.

Using --call-graph with perf record didn't seem to help.

Any ideas ?

Can you arrange to run 'pstack' a few times on the target process 
(either manually or with a shell script)?
If the probability of the process being in the spinning state is high, 
then this approach should snag you at least one call stack.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread David Boreham

On 11/26/2013 7:26 AM, Craig James wrote:


For those of us with small (a few to a dozen servers), we'd like to 
get out of server maintenance completely. Can anyone with experience 
on a cloud VM solution comment?  Do the VM solutions provided by the 
major hosting companies have the same good performance as the VM's 
that that several have described here?


Obviously there's Amazon's new Postgres solution available.  What else 
is out there in the way of instant on solutions with 
Linux/Postgres/Apache preconfigured systems?  Has anyone used them in 
production?


I've done some work with Heroku and the MySQL flavor of AWS service.
They work, and are convenient, but there are a couple of issues :

1. Random odd (and bad) things can happen from a performance perspective 
that you just need to cope with. e.g. I/O will become vastly slower for 
periods of 10s of seconds, once or twice a day. If you don't like the 
idea of phenomena like this in your system, beware.


2. Your inability to connect with the bare metal may turn out to be a 
significant hassle when trying to understand some performance issue in 
the future. Tricks that we're used to using such as looking at iostat 
(or even top) output are no longer usable because the hosting company 
will not give you a login on the host VM. This limitation extends to 
many many techniques that have been commonly used in the past and can 
become a major headache to the point where you need to reproduce the 
system on physical hardware just to understand what's going on with it 
(been there, done that...)


For the reasons above I would caution deploying a production service 
(today) on a SaaS database service like Heroku or Amazon RDS.
Running your own database inside a stock VM might be better, but it can 
be hard to get the right kind of I/O for that deployment scenario.
In the case of self-hosted VMWare or KVM obviously you have much more 
control and observability.


Heroku had (at least when I last used it, a year ago or so) an 
additional issue in that they host on AWS VMs so if something goes wrong 
you are talking to one company that is using another company's virtual 
machine service. Not a recipe for clarity, good service and hair 
retention...








--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD vs Intel

2013-09-04 Thread David Boreham

On 9/4/2013 3:01 AM, Johan Loubser wrote:
I am tasked with getting specs for a postgres database server for the 
core purpose of running moodle at our university.
The main question is at the moment is 12core AMD or 6/8core (E Series) 
INTEL.


What would be the most in portend metric in planning an enterprise 
level server for moodle.




The only way to be sure is to test your workload on the two different 
machines.


That said, we moved from AMD to Intel E series Xeon a year ago. We were 
using 8-core AMD devices not 12, and switched to 6-core Intel. The Intel 
devices were more attractive due to their (much) higher single-core 
performance. Another factor was a history of hardware bugs in the 
support chips used on AMD systems that we haven't generally seen with 
Intel. So all other things being equal, Intel brings less hassle, in my 
experience.







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread David Boreham

On 5/22/2013 8:18 AM, Greg Smith wrote:


They can easily hit that number.  Or they can do this:

Device: r/sw/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  
%util

sdd 2702.80  19.40  19.67   0.1614.91   273.68  71.74 0.37 100.00
sdd 2707.60  13.00  19.53   0.1014.78   276.61  90.34 0.37 100.00

That's an Intel 710 being crushed by a random read database server 
workload, unable to deliver even 3000 IOPS / 20MB/s.  I have hours of 
data like this from several servers.


This is interesting. Do you know what it is about the workload that 
leads to the unusually low rps ?






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-20 Thread David Boreham

On 3/20/2013 6:44 PM, David Rees wrote:

On Thu, Mar 14, 2013 at 4:37 PM, David Boreham david_l...@boreham.org wrote:

You might want to evaluate the performance you can achieve with a single-SSD
(use several for capacity by all means) before considering a RAID card + SSD
solution.
Again I bet it depends on the application but our experience with the older
Intel 710 series is that their performance out-runs the CPU, at least under
our PG workload.

How many people are using a single enterprise grade SSD for production
without RAID? I've had a few consumer grade SSDs brick themselves -
but are the enterprise grade SSDs, like the new Intel S3700 which you
can get in sizes up to 800GB, reliable enough to run as a single drive
without RAID1? The performance of one is definitely good enough for
most medium sized workloads without the complexity of a BBU RAID and
multiple spinning disks...



You're replying to my post, but I'll raise my hand again :)

We run a bunch of single-socket 1U, short-depth machines (Supermicro 
chassis) using 1x Intel 710 drives (we'd use S3700 in new deployments 
today). The most recent of these have 128G and E5-2620 hex-core CPU and 
dissipate less than 150W at full-load.


Couldn't be happier with the setup. We have 18 months up time with no 
drive failures, running at several hundred wps 7x24. We also write 10's 
of GB of log files every day that are rotated, so the drives are getting 
beaten up on bulk data overwrites too.


There is of course a non-zero probability of some unpleasant firmware 
bug afflicting the drives (as with regular spinning drives), and 
initially we deployed a spare 10k HD in the chassis, spun-down, that 
would allow us to re-jigger the machines without SSD remotely (the data 
center is 1000 miles away). We never had to do that, and later 
deployments omitted the HD spare. We've also considered mixing SSD from 
two vendors for firmware-bug-diversity, but so far we only have one 
approved vendor (Intel).
















--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-13 Thread David Boreham

On 3/13/2013 1:23 PM, Steve Crawford wrote:


What concerns me more than wear is this:

InfoWorld Article:
http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 



Referenced research paper:
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault 



Kind of messes with the D in ACID.


It is somewhat surprising to discover that many SSD products are not 
durable under sudden power loss (what where they thinking!?, and ...why 
doesn't anyone care??).


However, there is a set of SSD types known to be designed to address 
power loss events that have been tested by contributors to this list.
Use only those devices and you won't see this problem. SSDs do have a 
wear-out mechanism but wear can be monitored and devices replaced in 
advance of failure. In practice longevity is such that most machines 
will be in the dumpster long before the SSD wears out. We've had 
machines running with several hundred wps constantly for 18 months using 
Intel 710 drives and the wear level SMART value is still zero.


In addition, like any electronics module (CPU, memory, NIC), an SSD can 
fail so you do need to arrange for valuable data to be replicated.
As with old school disk drives, firmware bugs are a concern so you might 
want to consider what would happen if all the drives of a particular 
type all decided to quit working at the same second in time (I've only 
seen this happen myself with magnetic drives, but in theory it could 
happen with SSD).








--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-13 Thread David Boreham

On 3/13/2013 9:29 PM, Mark Kirkwood wrote:
Just going through this now with a vendor. They initially assured us 
that the drives had end to end protection so we did not need to 
worry. I had to post stripdown pictures from Intel's s3700, showing 
obvious capacitors attached to the board before I was taken seriously 
and actually meaningful specifications were revealed. So now I'm 
demanding to know:


- chipset (and version)
- original manufacturer (for re-badged ones)
- power off protection *explicitly* mentioned
- show me the circuit board (and where are the capacitors) 


In addition to the above, I only use drives where I've seen compelling 
evidence that plug pull tests have been done and passed (e.g. done by 
someone on this list or in-house here).  I also like to have a high 
level of confidence in the firmware development group. This results in a 
very small set of acceptable products :(






Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:13 PM, Evgeny Shishkin wrote:
Yes, i am aware of this issue. Never experienced this neither on intel 
520, no ocz vertex 3.

Have you heard of them on this list?
People have done plug-pull tests and reported the results on the list 
(sometime in the past couple of years).


But you don't need to do the test to know these drives are not safe. 
They're unsafe by design.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:20 PM, Evgeny Shishkin wrote:

Oh, there is no 100% safe system.
In this case we're discussing specifically safety in the event of power 
loss shortly after the drive indicates to the controller that it has 
committed a write operation. Some drives do provide 100% safety against 
this event, and they don't cost much more than those that don't.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:38 PM, Evgeny Shishkin wrote:

Which drives would you recommend? Besides intel 320 and 710.
Those are the only drive types we have deployed in servers at present 
(almost all 710, but we have some 320 for less mission-critical 
machines). The new DC-S3700 Series looks nice too, but isn't yet in the 
sales channel :

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:49 PM, Evgeny Shishkin wrote:
Yeah, s3700 looks promising, but sata interface is limiting factor for 
this drive.
I'm looking towards SMART ssd 
http://www.storagereview.com/smart_storage_systems_optimus_sas_enterprise_ssd_review



What don't you like about SATA ?

I prefer to avoid SAS drives if possible due to the price premium for 
dubious benefits besides vague hand-waving enterprise-ness promises.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 8:11 PM, Evgeny Shishkin wrote:


Quoting 
http://www.storagereview.com/intel_ssd_dc_s3700_series_enterprise_ssd_review

Heh. A fine example of the kind of hand-waving of which I spoke ;)

Higher performance is certainly a benefit, although at present we can't 
saturate even a single 710 series drive (the application, CPU, OS, etc 
is the bottleneck). Similarly while dual-porting certainly has its uses, 
it is not something I need.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HT on or off for E5-26xx ?

2012-11-08 Thread David Boreham

On 11/8/2012 6:58 AM, Shaun Thomas wrote:

On 11/07/2012 09:16 PM, David Boreham wrote:


bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48


Unfortunately without -S, you're not really testing the processors. A 
regular pgbench can fluctuate a more than that due to writing and 
checkpoints.
Hmm...my goal was to test with a workload close to our application's 
(which is heavy OLTP, small transactions and hence sensitive to I/O 
commit rate).
The hypothesis I was testing was that enabling HT positively degrades 
performance (which in my case it does not). I wasn't to be honest really 
testing the additional benefit from HT, rather observing that it is 
non-negative :)


If I have time I can run the select-only test for you and post the 
results. The DB fits into memory so it will be a good CPU test.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HT on or off for E5-26xx ?

2012-11-07 Thread David Boreham

On 11/6/2012 9:16 PM, Mark Kirkwood wrote:



I've been benchmarking a E5-4640 (4 socket) and hyperthreading off 
gave much better scaling behaviour in pgbench (gentle rise and flatten 
off), whereas with hyperthreading on there was a dramatic falloff 
after approx number clients = number of (hyperthreaded) cpus. The box 
is intended to be a pure db server, so we are running with 
hyperthreading off.


It looks like this syndrome is not observed on my box, likely due to the 
much lower number of cores system-wide (12).
I see pgbench tps increase nicely until #threads/clients == #cores, then 
plateau. I tested up to 96 threads btw.


We're waiting on more memory modules to arrive. I'll post some test 
results once we have all 4 memory banks populated.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HT on or off for E5-26xx ?

2012-11-07 Thread David Boreham

On 11/7/2012 6:37 AM, Devrim GÜNDÜZ wrote:

HT should be good for file servers, or say many of the app servers, or
small web/mail  servers. PostgreSQL relies on the CPU power, and since
the HT CPUs don't have the same power as the original CPU, when OS
submits a job to that particular HTed CPU, query will run significantly
slow. To avoid issues, I would suggest you to turn HT off on all
PostgreSQL servers. If you can throw some more money, another 6-core CPU
would give more benefit.
I realize this is the received knowledge but it is not supported by 
the evidence before me (which is that I get nearly 2x the throughput 
from pgbench using nthreads == nhtcores vs nthreads == nfullcores). 
Intel's latest HT implementation seems to suffer less from the kinds of 
resource sharing contention issues seen in older generations.


Once I have the machine's full memory installed I'll run pgbench with HT 
disabled in the BIOS and post the results.







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HT on or off for E5-26xx ?

2012-11-07 Thread David Boreham
Well, the results are in and at least in this particular case 
conventional wisdom is overturned. Not a huge benefit, but throughput is 
definitely higher with HT enabled and nthreads  ncores:


HT off :

bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 2435711
tps = 4058.667332 (including connections establishing)
tps = 4058.796309 (excluding connections establishing)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  52.500.00   14.795.070.00   27.64

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  5700.300.10 13843.50 0.00 74.78
11.0648.463.50   0.05  65.21


HT on:

bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 2832463
tps = 4720.668984 (including connections establishing)
tps = 4720.750477 (excluding connections establishing)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  40.610.00   12.713.090.00   43.59

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6197.10   14.80 16389.50 0.14 86.53
10.8254.113.30   0.05  82.35


System details:

E5-2620 (6 core + HT 15Mb LL) 64G (4 channels with 16G 1333 modules), 
Intel 710 300G (which is faster than the smaller drives, note), 
Supermicro X9SRi-F Motherboard.
CentOS 6.3 64-bit, PG 9.2.1 from the PGDG RPM repository. pgbench 
running locally on the server.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] HT on or off for E5-26xx ?

2012-11-06 Thread David Boreham


I'm bringing up a new type of server using Intel E5-2620 (unisocket) 
which was selected for good SpecIntRate performance vs cost/power (201 
for $410 and 95W).


Was assuming it was 6-core but I just noticed it has HT which is 
currently enabled since I see 12 cores in /proc/cpuinfo


Question for the performance experts : is it better to have HT enabled 
or disabled for this generation of Xeon ?
Workload will be moderately concurrent, small OLTP type transactions. 
We'll also run a few low-load VMs (using KVM) and a big Java application.


Any thoughts welcome.
Thanks.








--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-10-02 Thread David Boreham

On 10/2/2012 2:20 AM, Glyn Astill wrote:

newer R910s recently all of a sudden went dead to the world; no prior symptoms
showing in our hardware and software monitoring, no errors in the os logs,
nothing in the dell drac logs.  After a hard reset it's back up as if
nothing happened, and it's an issue I'm none the wiser to the cause.  Not
good piece of mind.

This could be an OS bug rather than a hardware problem.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-28 Thread David Boreham

On 9/28/2012 9:46 AM, Craig James wrote:

Your best warranty would be to have the confidence to do your own
repairs, and to have the parts on hand.  I'd seriously consider
putting your own system together.  Maybe go to a few sites with
pre-configured machines and see what parts they use.  Order those,
screw the thing together yourself, and put a spare of each critical
part on your shelf.

This is what I did for years, but after taking my old parts collection 
to the landfill a few times, realized I may as well just buy N+1 
machines and keep zero spares on the shelf. That way I get a spare 
machine available for use immediately, and I know the parts are working 
(parts on the shelf may be defective). If something breaks, I use the 
spare machine until the replacement parts arrive.


Note in addition that a warranty can be extremely useful in certain 
organizations as a vehicle of blame avoidance (this may be its primary 
purpose in fact). If I buy a bunch of machines that turn out to have 
buggy NICs, well that's my fault and I can kick myself since I own the 
company, stay up late into the night reading kernel code, and buy new 
NICs. If I have an evil Dilbertian boss, then well...I'd be seriously 
thinking about buying Dell boxes in order to blame Dell rather than 
myself, and be able to say everything is warrantied if badness goes 
down. Just saying...





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 1:11 PM, M. D. wrote:


I want to buy a new server, and am contemplating a Dell R710 or the 
newer R720.  The R710 has the x5600 series CPU, while the R720 has the 
newer E5-2600 series CPU.


For this the best data I've found (excepting actually running tests on 
the physical hardware) is to use the SpecIntRate2006 numbers, which can 
be found for both machines on the spec.org web site.


I think the newer CPU is the clear winner with a specintrate performance 
of 589 vs 432.
It also has a significantly larger cache. Comparing single-threaded 
performance, the older CPU is slightly faster (50 vs 48). That wouldn't 
be a big enough difference to make me pick it.


The Sandy Bridge-based machine will likely use less power.

http://www.spec.org/cpu2006/results/res2012q2/cpu2006-20120604-22697.html

http://www.spec.org/cpu2006/results/res2012q1/cpu2006-20111219-19272.html

To find more results use this page : 
http://www.spec.org/cgi-bin/osgresults?conf=cpu2006;op=form

(enter R710 or R720 in the system field).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 1:37 PM, Craig James wrote:

We use a white box vendor (ASA Computers), and have been very happy
with the results.  They build exactly what I ask for and deliver it in
about a week.  They offer on-site service and warranties, but don't
pressure me to buy them.  I'm not locked in to anything.  Their prices
are good.


I'll second that : we build our own machines from white-label parts for 
typically less than 1/2 the Dell list price. However, Dell does provide 
value to some people : for example you can point a third-party software 
vendor at a Dell box and demand they make their application work 
properly whereas they may turn their nose up at a white label box. Same 
goes for Operating Systems : we have spent much time debugging Linux 
kernel issues on white box hardware. On Dell hardware we would most 
likely have not hit those bugs because Red Hat tests on Dell. So YMMV...







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 1:56 PM, M. D. wrote:
I'm in Belize, so what I'm considering is from ebay, where it's 
unlikely that I'll get the warranty.  Should I consider some other 
brand rather? To build my own or buy custom might be an option too, 
but I would not get any warranty. 
I don't have any recent experience with white label system vendors, but 
I suspect they are assembling machines from supermicro, asus, intel or 
tyan motherboards and enclosures, which is what we do. You can buy the 
hardware from suppliers such as newegg.com. It takes some time to read 
the manufacturer's documentation, figure out what kind of memory to buy 
and so on, which is basically what you're paying a white label box 
seller to do for you.


For example here's a similar barebones system to the R720 I found with a 
couple minutes searching on newegg.com : 
http://www.newegg.com/Product/Product.aspx?Item=N82E16816117259
You could order that SKU, plus the two CPU devices, however many memory 
sticks you need, and drives. If you need less RAM (the Dell box allows 
up to 24 sticks) there are probably cheaper options.


The equivalent Supermicro box looks to be somewhat less expensive : 
http://www.newegg.com/Product/Product.aspx?Item=N82E16816101693


When you consider downtime and the cost to ship equipment back to the 
supplier, a warranty doesn't have much value to me but it may be useful 
in your situation.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 2:55 PM, Scott Marlowe wrote:

Whatever you do, go for the Intel ethernet adaptor option. We've had so many
headaches with integrated broadcom NICs.:(
Sound advice, but not a get out of jail card unfortunately : we had a 
horrible problem with the Intel e1000 driver in RHEL for several releases.

Finally diagnosed it just as RH shipped a fixed driver.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 2:47 PM, Shaun Thomas wrote:

On 09/27/2012 02:40 PM, David Boreham wrote:


I think the newer CPU is the clear winner with a specintrate
performance of 589 vs 432.


The comparisons you linked to had 24 absolute threads pitted against 
32, since the newer CPUs have a higher maximum cores per CPU. That 
said, you're right that it has a fairly large cache. And from my 
experience, Intel CPU generations have been scaling incredibly well 
lately. (Opteron, we hardly knew ye!)
Yes, the rate spec test uses all the available cores. I'm assuming a 
concurrent workload, but since the single-thread performance isn't that 
much different between the two I think the higher number of cores, 
larger cache, newer design CPU is the best choice.


We went from Dunnington to Nehalem, and it was stunning how much 
better the X5675 was compared to the E7450. Sandy Bridge isn't quite 
that much of a jump though, so if you don't need that kind of 
bleeding-edge, you might be able to save some cash. This is especially 
true since the E5-2600 series has the same TDP profile and both use 
32nm lithography.
We use Opteron on a price/performance basis. Intel always seems to come 
up with some way to make their low-cost processors useless (such as 
limiting the amount of memory they can address).







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware advice

2012-09-27 Thread David Boreham

On 9/27/2012 3:16 PM, Claudio Freire wrote:

Careful with AMD, since many (I'm not sure about the latest ones)
cannot saturate the memory bus when running single-threaded. So, great
if you have a high concurrent workload, quite bad if you don't.


Actually we test memory bandwidth with John McCalpin's stream program.
Unfortunately it is hard to find stream test results for recent machines so it 
can be hard to compare two boxes unless you own examples, so I didn't mention 
it as a useful option. But if you can find results for the machines, or ask a 
friend to run it for you...definitely useful information.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD selection

2012-05-16 Thread David Boreham

On 5/16/2012 11:01 AM, Merlin Moncure wrote:

Although your assertion 100% supported by intel's marketing numbers,
there are some contradicting numbers out there that show the drives
offering pretty similar performance.  For example, look here:
http://www.anandtech.com/show/4902/intel-ssd-710-200gb-review/4  and
you can see that 4k aligned writes are giving quite similar results
(14k iops) even though the 710 is only rated for 2700 iops while the
320 is rated for 21000 IOPS.  Other benchmarks also show similar
results.
Actually I said the same thing you're saying : that the two series will 
deliver similar performance.


The spec numbers however would be for worst case conditions (in the case 
of the 710).
I'm not convinced that those tests were exercising the worst case part 
of the envelope.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD selection

2012-05-15 Thread David Boreham

On 5/15/2012 9:21 AM, Віталій Тимчишин wrote:



We've reached to the point when we would like to try SSDs. We've got a 
central DB currently 414 GB in size and increasing. Working set does 
not fit into our 96GB RAM server anymore.

So, the main question is what to take. Here what we've got:
1) Intel 320. Good, but slower then current generation sandforce drives
2) Intel 330. Looks like cheap 520 without capacitor
3) Intel 520. faster then 320 No capacitor.
4) OCZ Vertex 3 Pro - No available. Even on OCZ site
5) OCZ Deneva - can't find in my country :)



Is the 710 series too costly for your deployment ?
I ask because that would be the obvious choice for a database (much 
better write endurance than any of the drives above, and less likely to 
suffer from firmware bugs or unpleasant GC behavior).
We've been running them in production for a few months with zero 
problems and great performance.
The price on the 710's tends to vary on whether they're in stock : 
NewEgg is currently showing $1100 for the 300G drive, but no stock.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD selection

2012-05-15 Thread David Boreham

On 5/15/2012 12:16 PM, Rosser Schwarz wrote:

As the other posters in this thread have said, your best bet is
probably the Intel 710 series drives, though I'd still expect some
320-series drives in a RAID configuration to still be pretty
stupendously fast.
One thing to mention is that the 710 are not faster than 320 series 
(unless in your definition of fast you count potential GC pauses of course).
The 710's primary claim to fame is that it has endurance and GC 
characteristics designed for server and database use (constant load, 
heavy write load).


So 320 drives will be just as fast, if not faster, but they will wear 
out much more quickly (possibly not a concern for the OP in his 
deployment) and may suffer from unwelcome GC pauses.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel 710 pgbench write latencies

2011-11-02 Thread David Boreham




So the Intel 710 kind of sucks latency wise. Is it because it is also 
heavily reading, and maybe WAL should not be put on it?


A couple quick thoughts:

1. There are a lot of moving parts in the system besides the SSDs.
It will take some detailed analysis to determine the cause for the
outlying high latency transactions. The cause may not be as simple
as one SSD processes I/O operations less quickly than another.
For example the system may be subject to some sort of
starvation issue in PG or the OS that is affected by quite
small differences in underlying storage performance.

2. What are your expectations for maximum transaction latency ?
In my experience it is not possible to guarantee sub-second
(or even sub-multi-second) latencies overall in a system
built with general purpose OS and database software.
(put another way : a few outlying 1 second and even
several-second transactions would be pretty much what
I'd expect to see on a database under sustained saturation
load as experienced under a pgbench test).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing between Intel 320, Intel 510 or OCZ Vertex 3 SSD for db server

2011-10-28 Thread David Boreham

On 10/28/2011 12:40 AM, Amitabh Kant wrote:



Sadly, 710 is not that easily available around here at the moment.



All three sizes are in stock at newegg.com, if you have a way to export 
from the US to your location.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WAL in RAM

2011-10-28 Thread David Boreham

On 10/28/2011 12:26 PM, Tomas Vondra wrote:

For example the Intel 710 SSD has a sequential write speed of 210MB/s,
while a simple SATA 7.2k drive can write about 50-100 MB/s for less than
1/10 of the 710 price.

Bulk data transfer rates mean almost nothing in the context of a database
(unless you're for example backing it up by copying the files to another 
machine...)
The key factor typically is small block writes/s (for WAL) and random 
small block
reads/s (for data). 710 or similar performance SSDs will deliver on the 
order

of 20-50x the performance of a traditional hard drive in these areas.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing between Intel 320, Intel 510 or OCZ Vertex 3 SSD for db server

2011-10-25 Thread David Boreham

On 10/25/2011 8:55 AM, Claudio Freire wrote:
But what about unexpected failures. Faulty electronics, stuff like 
that? I really don't think a production server can work without at 
least raid-1. 


Same approach : a server either works or it does not. The transition 
between working and not working may be expected or not expected. The 
solution is the same : use another machine to perform the work the now 
not working machine was doing. The big benefit of this approach is that 
you now do not need to worry about specific kinds of failure or 
individual components, including storage.


If it helps, think of this architecture as raid-1, but with the whole 
machine being the drive rather than individual storage devices.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing between Intel 320, Intel 510 or OCZ Vertex 3 SSD for db server

2011-10-24 Thread David Boreham


A few quick thoughts:

1. 320 would be the only SSD I'd trust from your short-list. It's the 
only one with proper protection from unexpected power loss.
2. Multiple RAID'ed SSDs sounds like (vast) overkill for your workload. 
A single SSD should be sufficient (will get you several thousand TPS on 
pgbench for your DB size).
3. Consider not using the magnetic disks at all (saves on space, power 
and the cost of the RAID controller for them).
4. Consider using Intel 710 series rather than 320 (pay for them with 
the money saved from #3 above). Those devices have much, much higher 
specified endurance than the 320s and since your DB is quite small you 
only need to buy one of them.


On 10/24/2011 8:09 AM, Amitabh Kant wrote:

Hello

I need to choose between Intel 320 , Intel 510 and OCZ Vertex 3 SSD's 
for my database server. From recent reading in the list and other 
places, I have come to understand that OCZ Vertex 3 should not be 
used, Intel 510 uses a Marvel controller while Intel 320 had a nasty 
bug which has been rectified. So the list narrows down to only 510 and 
320, unless I have understood the OCZ Vertex reviews incorrectly.


The server would itself be built along these lines: Dual CPU Xeon 
5620, 32 or 48 GB RAM, 2 SAS 10K disk in RAID 1 for OS, 2 SAS 10K disk 
in RAID 1 for pg_xlog and 4 SSD in RAID 10 for data directory 
(overkill??). OS would be FreeBSD 8.2 (I would be tuning the sysctl 
variables). PG version would be 9.1 with replication set to another 
machine (Dual CPU Xeon 54xx, 32 GB RAM, 6 15K SAS 146 GB: 4 in RAID 10 
for data and 2 in RAID 1 for OS + pg_xlog). The second machine hosts 
my current db , and there is not much of an issue with the 
performance. We need better redundancy now(current was to take a 
dump/backup every 12 hours), so the new machine.


My database itself is not very big, approx 40 GB as of now, and would 
not grow beyond 80 GB in the next year or two. There are some tables 
where insert  updates are fairly frequent. From what I could gather, 
we are not doing more than 300-400 tps at the moment, and the growth 
should not be very high in the short term.


Hope someone can give some pointers to which SSD I should go for at 
the moment.



Amitabh




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing between Intel 320, Intel 510 or OCZ Vertex 3 SSD for db server

2011-10-24 Thread David Boreham

On 10/24/2011 3:31 PM, Merlin Moncure wrote:

4. Consider using Intel 710 series rather than 320 (pay for them with the
  money saved from #3 above). Those devices have much, much higher specified
  endurance than the 320s and since your DB is quite small you only need to
  buy one of them.

710 is good idea if and only if you are worried about write durability
(in which case it's a great idea).


I disagree with this (that it is the only reason to select 710 series).
The write endurance (specified at least) is orders of magnitude higher.
Doing 100's of TPS constantly it is possible to burn through the 320's 
endurance

lifetime in a year or two.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing between Intel 320, Intel 510 or OCZ Vertex 3 SSD for db server

2011-10-24 Thread David Boreham

On 10/24/2011 4:47 PM, Claudio Freire wrote:

What about redundancy?

How do you swap an about-to-die SSD?

Software RAID-1?


The approach we take is that we use 710 series devices which have 
predicted reliability similar to all the other components in the 
machine, therefore the unit of replacement is the entire machine. We 
don't use trays for example (which saves quite a bit on data center 
space). If I were running short endurance devices such as 320 series I 
would be interested in replacing the drives before the machine itself is 
likely to fail, but I'd do so by migrating the data and load to another 
machine for the replacement to be done offline. Note that there are 
other operations procedures that need to be done and can not be done 
without downtime (e.g. OS upgrade), so some kind of plan to deliver 
service while a single machine is down for a while will be needed 
regardless of the storage device situation.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Intel 710 Endurance Test Results

2011-10-05 Thread David Boreham


I ran a test using Intel's timed workload wear indication feature on a 
100G 710 series SSD.


The test works like this : you reset the wear indication counters, then 
start running
some workload (in my case pgbench at scale 100 for 4 hours). During the 
test run
a wear indication attribute can be read via smartctl. This contains the 
drive's estimate
for the total drive wear lifetime used during the test run time. 
Provided we trust
the drive's estimate, this in turn provides an answer to the interesting 
question
how long before my drive wears out in production?. That is, provided 
the test

workload can be reasonably accurately related to production workload.

Executive summary : the drive says that we used 0.025% of its wear life
during this 4h test. The test performed 47 million transactions on a
roughly 17G database.

We think that this pgbench workload performs transactions at roughly
10X the rate we expect from our application in production under heavy load.
So the drive says you could run this pgbench test for about two years
before wearing out the flash devices. Or 20 years under our expected 
production

workload.

Smart attributes read just after test completion:

[root@server1 9.1]# smartctl -A /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-71.29.1.el6.x86_64] 
(local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  
UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time0x0020   100   100   000Old_age   
Offline  -   0
  4 Start_Stop_Count0x0030   100   100   000Old_age   
Offline  -   0
  5 Reallocated_Sector_Ct   0x0032   100   100   000Old_age   
Always   -   0
  9 Power_On_Hours  0x0032   100   100   000Old_age   
Always   -   90
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   
Always   -   8
170 Reserve_Block_Count 0x0033   100   100   010Pre-fail  
Always   -   0
171 Program_Fail_Count  0x0032   100   100   000Old_age   
Always   -   0
172 Erase_Fail_Count0x0032   100   100   000Old_age   
Always   -   0
174 Unknown_Attribute   0x0032   100   100   000Old_age   
Always   -   4
183 SATA_Downshift_Count0x0030   100   100   000Old_age   
Offline  -   0
184 End-to-End_Error0x0032   100   100   090Old_age   
Always   -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   
Always   -   0
190 Airflow_Temperature_Cel 0x0032   073   070   000Old_age   
Always   -   27 (Min/Max 17/30)
192 Unsafe_Shutdown_Count   0x0032   100   100   000Old_age   
Always   -   4
194 Temperature_Celsius 0x0032   100   100   000Old_age   
Always   -   32
199 UDMA_CRC_Error_Count0x0030   100   100   000Old_age   
Offline  -   0
225 Host_Writes_32MiB   0x0032   100   100   000Old_age   
Always   -   51362
226 Workld_Media_Wear_Indic 0x0032   100   100   000Old_age   
Always   -   26
227 Workld_Host_Reads_Perc  0x0032   100   100   000Old_age   
Always   -   1
228 Workload_Minutes0x0032   100   100   000Old_age   
Always   -   242
232 Available_Reservd_Space 0x0033   100   100   010Pre-fail  
Always   -   0
233 Media_Wearout_Indicator 0x0032   100   100   000Old_age   
Always   -   0
241 Host_Writes_32MiB   0x0032   100   100   000Old_age   
Always   -   51362
242 Host_Reads_32MiB0x0032   100   100   000Old_age   
Always   -   822


pgbench output from the test run:

bash-4.1$ /usr/pgsql-9.1/bin/pgbench -T 14400 -j 8 -c 64
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
query mode: simple
number of clients: 64
number of threads: 8
duration: 14400 s
number of transactions actually processed: 47205109
tps = 3278.127690 (including connections establishing)
tps = 3278.135396 (excluding connections establishing)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/1/2011 10:00 PM, Gregory Gerard wrote:
How does this same benchmark compare on similar (or same) hardware but 
with magnetic media?


I don't have that data at present :(

So far I've been comparing performance with our current production 
machines, which are older.
Those machines use 'raptors with no BBU/WBC, so their tps performance is 
very low (100 per second or so).
Since our application tends to be limited by the transaction commit 
rate, the SSD-based machine
is clearly going to be at least an order of magnitude faster. But I 
don't know its relative performance
vs. the traditional BBU/WBC controller + string of drives approach (we 
rejected that as an option already

for power consumption and space reasons).

I could install a raptor drive in the new server and test that, but it 
wouldn't have a BBU/WBC controller.
I guess I could enable the drive's write cache which might produce 
comparable results to a controller's WBC.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/1/2011 9:22 PM, Andy wrote:
Do you have an Intel 320?  I'd love to see tests comparing 710 to 320 
and see if it's worth the price premium.


Good question. I don't have a 320 drive today, but will probably get one 
for testing soon.


However, my conclusion based on the Intel spec documents is that the 710 
and 320 will have similar performance.
We elected to use 710 devices not for performance reasons vs the 320 but 
for the specified lifetime endurance.
The 710 series is specified at around 4k complete drive overwrites where 
the 320 is specified at only
around 100. So I see the 710 series as like the 320 but much less 
likely to wear out. It may also have
better behavior under constant load (the white paper makes vague mention 
of different background GC

vs the non-enterprise drives).

So for our uses, the 320 series looks great except for concerns that :
a) the may wear out quite quickly, leading to extra cost to enter data 
center and pull drives, etc and the need to maintain a long-term test 
rig to determine if and when they wear out before it happens in production.
b) the GC may behave badly under constant load (leading for example to 
unexpected periods of relatively very poor performance).


The value proposition for the 710 vs. 320 for me is not performance but 
the avoidance of these two worries.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/2/2011 2:33 AM, Arjen van der Meijden wrote:



Given the fact that you can get two 320's for the price of one 710, 
its probably always a bit difficult to actually make the choice 
(unless you want a fixed amount of disks and the best endurance 
possible for that).


One thing I'd add to this is that the price/bit is more like 4X ($2k for 
the 300G 710 vs $540 for the 300G 320).
The largest 710 drive is 300G whereas the largest 320 is 600G which may 
imply that the 710's are twice
as over-provisioned as the 320. It may be that at present we're paying 
2x for the relative over-provisioning
and another 2x to enjoy the better silicon and firmware. This hopefully 
implies that prices will fall

in the future provided a credible competitor emerges (Hitachi??).







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/2/2011 6:26 PM, Gregory Gerard wrote:

If I may ask what were your top three candidates before choosing the intel?
All the other options considered viable were using traditional 
rotational disks.

I personally don't have any confidence in the other SSD vendors today,
except perhaps for FusionIO (where a couple of old friends work, and I 
can certainly vouch for
their competence) but their products are too costly for our application 
at present.


Also why not just plan a graceful switch to a replicated server? At some point 
you have to detect the drive is about to go (or it just goes without warning). 
Presumably that point will be in a while and be coordinated with an upgrade 
like 9.2 in a year.
Sure, we have this capability but once you walk through what has to 
happen if you are burning through
SSDs every few months, the 710 value proposition is more attractive for 
us. For example our data center
is 1200 miles from our HQ and it takes a very long road trip or a plane 
flight to get hands-on with the
boxes. We spent some considerable time planning for the 320 style 
deployment to be honest --
figuring out how to predict when the drive would wear out, building 
replication mechanisms that
would cope gracefully and so on. But given the option of the 710 where 
wear out can essentially
be crossed off the list of things to worry about, that's the way we 
decided to go.


Finally why not the pci based cards?


Few reasons: 1) Today Intel doesn't make them (that will change soon),
2) a desire to maintain backwards compatibility at least for this 
generation, on a system architecture level
with traditional disk drives, 3) concerns about mechanical integrity and 
system airflow issues with the
PCI card and connector in 1U enclosures. The SSD fits into the same 
location as a traditional disk

but can be velcro'ed down rather than bolted for easier field replacement.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/2/2011 10:35 PM, Greg Smith wrote:


That sounds about the same performance as the 320 drive I tested 
earlier this year then.  You might try duplicating some of the 
benchmarks I ran on that:  
http://archives.postgresql.org/message-id/4d9d1fc3.4020...@2ndquadrant.com
Thanks. Actually I had been looking for that email, which by brain 
remembered but my computer and Google could not find ;)


Make sure to reference the capacity of the drive though.  The 320 
units do scale their performance based on that, presumably there's 
some of that with the 710s as well.
This is a 100G drive. The performance specs vary curiously vs capacity 
for the 710 series : write tps actually goes down as the size increases, 
but bulk write data rate is higher for the larger drives.


I ran some pgbench earlier this evening, before reading your old email 
above, so the parameters are different:


This is the new server with 100G 710 (AMD 6128 with 64G):

bash-4.1$ /usr/pgsql-9.1/bin/pgbench -T 600 -j 8 -c 64
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 8
duration: 600 s
number of transactions actually processed: 2182014
tps = 3636.520405 (including connections establishing)
tps = 3636.738290 (excluding connections establishing)

This is the output from iostat while the test runs:
Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  4851.000.00 14082.00 0.0073.76
10.7345.723.25   0.05  71.60


This is our current production server type (Dual AMD 2346HE 32G 10K 300G 
'raptor) with disk write cache turned off and with data and wal on the 
same drive:


bash-3.2$ /usr/pgsql-9.1/bin/pgbench -T 600 -j 8 -c 64
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 8
duration: 600 s
number of transactions actually processed: 66426
tps = 108.925653 (including connections establishing)
tps = 108.941509 (excluding connections establishing)

Device: rrqm/s   wrqm/s   r/s   w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00   438.00  0.00 201.00 0.00 2.60
26.4755.95  286.09   4.98 100.00


same server with disk write cache turned on:

bash-3.2$ /usr/pgsql-9.1/bin/pgbench -T 600 -j 8 -c 64
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 8
duration: 600 s
number of transactions actually processed: 184724
tps = 305.654008 (including connections establishing)
tps = 305.694317 (excluding connections establishing)

Device: rrqm/s   wrqm/s   r/s   w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00   668.00  0.00 530.00 0.00 4.55
17.57   142.99  277.28   1.89 100.10


There are some OS differences between the old and new servers : old is 
running CentOS 5.7 while the new is running 6.0.
Old server has atime enabled while new has relatime mount option 
specified. Both are running PG 9.1.1 from the yum repo.


One very nice measurement is the power consumption on the new server : 
peak dissipation is 135W under the pgbench load

(measured on the ac input to the psu). Idle is draws around 90W.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/2/2011 10:49 PM, David Boreham wrote:


There are some OS differences between the old and new servers : old is 
running CentOS 5.7 while the new is running 6.0.
Old server has atime enabled while new has relatime mount option 
specified. Both are running PG 9.1.1 from the yum repo.



Also the old server is using ext3 while the new has ext4 (with 
discard/trim enabled).





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suggestions for Intel 710 SSD test

2011-10-02 Thread David Boreham

On 10/2/2011 11:55 PM, Gregory Gerard wrote:

Which repo did you get them from?



http://yum.postgresql.org/9.1/redhat/rhel-$releasever-$basearch



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Suggestions for Intel 710 SSD test

2011-10-01 Thread David Boreham


I have a 710 (Lyndonville) SSD in a test server. Ultimately we'll run
capacity tests using our application (which in turn uses PG), but it'll
take a while to get those set up. In the meantime, I'd be happy to
entertain running whatever tests folks here would like to suggest,
spare time-permitting.

I've already tried bonnie++, sysbench and a simple WAL emulation
test program I wrote more than 10 years ago.  The drive tests at
around 160Mbyte/s on bulk data and 4k tps for commit rate writing
small blocks.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Intel 320 SSD info

2011-08-24 Thread David Boreham


Apologies if this has already been posted here (I hadn't seen it before 
today, and

can't find a previous post).
This will be of interest to anyone looking at using SSDs for database 
storage :

http://www.intel.com/content/www/us/en/solid-state-drives/ssd-320-enterprise-server-storage-application-specification-addendum.html



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel 320 SSD info

2011-08-24 Thread David Boreham

On 8/24/2011 11:17 AM, Merlin Moncure wrote:


hm, I think they need to reconcile those numbers with the ones on this
page: 
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-320-series.html

600 write ips vs 3.7k/23k.




They do provide an explanation (and what I find interesting about this 
document is that they are basically coming clean about the real 
worst-case performance, which I personally find refreshing and 
encouraging). The difference is that the high number is achieved if the 
drive does not need to perform a block erase to process the write (this 
is true most of the time since the capacity is over-provisioned and 
there is an expectation that GC will have generated free blocks in the 
background). The low number is the performance under worst-case 
conditions where the drive is a) full and b) no blocks have been 
trimmed, and c) GC wasn't able to run yet.


I suspect that in production use it will be possible to predict in 
advance when the drive is approaching the point where it will run out of 
free blocks, and hence perform poorly. Whether or not this is possible 
is a big question for us in planning our transition to SSDs in production.


Anyone using SSDs should be aware of how they work and the possible 
worst case performance. This article helps with that !






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel 320 SSD info

2011-08-24 Thread David Boreham

On 8/24/2011 11:23 AM, Andy wrote:

According to the specs for database storage:

Random 4KB arites: Up to 600 IOPS

Is that for real? 600 IOPS is *atrociously terrible* for an SSD. Not 
much faster than mechanical disks.



The underlying (Flash block) write rate really is terrible (and slower 
than most rotating disks).


The trick with SSD is that firmware performs all kinds of stunts to make 
the performance seen
by the OS much higher (most of the time !). This is akin to write-back 
caching in a raid controller,
for example, where much higher write rates than the physical drives 
support are achievable.





Re: [PERFORM] Intel 320 SSD info

2011-08-24 Thread David Boreham

On 8/24/2011 11:41 AM, Greg Smith wrote:



I've measured the performance of this drive from a couple of 
directions now, and it always comes out the same.  For PostgreSQL, 
reading or writing 8K blocks, I'm seeing completely random workloads 
hit a worst-case of 20MB/s; that's just over 2500 IOPS.  It's quite 
possible that number can go lower under pressure of things like 
internal drive garbage collection however, which I believe is going 
into the 600 IOPS figure.  I haven't tried to force that yet--drive is 
too useful to me to try and burn it out doing tests like that at the 
moment.


I hope someone from Intel is reading -- it would be well worth their 
while to just send you a few drives,
since you are set up to perform the right test, and can provide 
impartial results.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reports from SSD purgatory

2011-08-24 Thread David Boreham

On 8/24/2011 1:32 PM, Tomas Vondra wrote:
Why is that important? It's simply a failure of electronics and it has 
nothing to do with the wear limits. It simply fails without prior 
warning from the SMART.


In the cited article (actually in all articles I've read on this 
subject), the failures were not properly analyzed*.
Therefore the conclusion that the failures were of electronics 
components is invalid.
In the most recent article, people have pointed to it as confirming 
electronics failures
but the article actually states that the majority of failures were 
suspected to be

firmware-related.

We know that a) there have been failures, but b) not the cause.

We don't even know for sure that the cause was not cell wear.
That's because all we know is that the drives did not report
wear before failing. The wear reporting mechanism could be broken for 
all we know.


--
*A proper analysis would involve either the original manufacturer's FA 
lab, or a qualified independent analysis lab.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID Controllers

2011-08-23 Thread David Boreham

On 8/23/2011 5:14 AM, Robert Schnabel wrote:


I'm by no means an expert but it seems to me if you're going to choose 
between two 6 GB/s cards you may as well put SAS2 drives in.  I have 
two Adaptec 6445 cards in one of my boxes and several other Adaptec 
series 5 controllers in others.  They suit my needs and I haven't had 
any problems with them.  I think it has been mentioned previously but 
they do tend to run hot so plenty of airflow would be good.


Thanks. Good point about airflow. By SAS I meant 6Gbit SAS drives. But 
we have many servers already with 10k raptors and it is tempting to use 
those since we would be able to use a common pool of spare drives across 
all servers. 15K rpm is tempting though. I'm not sure if the DB 
transaction commit rate scales up linearly when BBU is used (it would 
without BBU).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID Controllers

2011-08-23 Thread David Boreham

On 8/22/2011 10:55 PM, Scott Marlowe wrote:

If you're running linux and thus stuck with the command line on the
LSI, I'd recommend anything else.  MegaRAID is the hardest RAID
control software to use I've ever seen.  If you can spring for the
money, get the Areca 1680:
http://www.newegg.com/Product/Product.aspx?Item=N82E16816151023  Be
sure and get the battery unit for it.  You can configure it from an
external ethernet connector very easily, and the performance is
outstandingly good.

Thanks. I took a look at Areca. The fan on the controller board is a big
warning signal for me (those fans are in my experience the single most
unreliable component ever used in computers).

Can you say a bit more about the likely problems with the CLI ?
I'm thinking that I configure the card once, and copy the config
to all the other boxes, so even if it's as obscure as Cisco IOS,
how bad can it be ? Is the concern more with things like a rebuild;
monitoring for drive failures -- that kind of constant management
task ?

How about Adaptec on Linux ? The supercapacitor and NAND
flash idea looks like a good one, provided the firmware doesn't
have bugs (true with any write back controller though).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RAID Controllers

2011-08-22 Thread David Boreham


I'm buying a bunch of new machines (all will run an application that heavily
writes to PG). These machines will have 2 spindle groups in a RAID-1 config.
Drives will be either 15K SAS, or 10K SATA (I haven't decided if it is 
better

to buy the faster drives, or drives that are identical to the ones we are
already running in our production servers, thus achieving commonality in
spares across all machines).

Controller choice looks to be between Adaptec 6405, with the 
supercapacitor unit;

or LSI 9260-4i with its BBU. Price is roughly the same.

Would be grateful for any thoughts on this choice.

Thanks.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use 1 core

2011-05-11 Thread David Boreham

On 5/11/2011 9:17 PM, Aren Cambre wrote:


So here's what's going on.


snip

If I were doing this, considering the small size of the data set, I'd 
read all the data into memory.
Process it entirely in memory (with threads to saturate all the 
processors you have).

Then write the results to the DB.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread David Boreham



hm, if it was me, I'd write a small C program that just jumped
directly on the device around and did random writes assuming it wasn't
formatted.  For sequential read, just flush caches and dd the device
to /dev/null.  Probably someone will suggest better tools though.
I have a program I wrote years ago for a purpose like this. One of the 
things it can
do is write to the filesystem at the same time as dirtying pages in a 
large shared
or non-shared memory region. The idea was to emulate the behavior of a 
database
reasonably accurately. Something like bonnie++ would probably be a good 
starting

point these days though.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread David Boreham

On 5/9/2011 3:11 PM, Merlin Moncure wrote:

The problem with bonnie++ is that the results aren't valid, especially
the read tests.  I think it refuses to even run unless you set special
switches.


I only care about writes ;)

But definitely, be careful with the tools. I tend to prefer small 
programs written in house myself,

and of course simply running your application under a synthesized load.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread David Boreham
Had to say a quick thanks to Greg and the others who have posted 
detailed test results on SSDs here.
For those of us watching for the inflection point where we can begin the 
transition from mechanical to solid state storage, this data and 
experience is invaluable. Thanks for sharing it.


A short story while I'm posting : my Dad taught electronics engineering 
and would often visit the local factories with groups of students. I 
remember in particular after a visit to a disk drive manufacturer 
(Burroughs), in 1977 he came home telling me that he'd asked the plant 
manager what their plan was once solid state storage made their products 
obsolete. The manager looked at him like he was form another planet...


So I've been waiting patiently 34 years for this hopefully 
soon-to-arrive moment ;)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread David Boreham

On 4/6/2011 9:19 PM, gnuo...@rcn.com wrote:

SSDs have been around for quite some time.  The first that I've found is Texas 
Memory.  Not quite 1977, but not flash either, although they've been doing so 
for a couple of years.
Well, I built my first ram disk (which of course I thought I had 
invented, at the time) in 1982.
But today we're seeing solid state storage seriously challenging 
rotating media across all applications, except at the TB and beyond 
scale. That's what's new.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GPU Accelerated Sorting

2010-08-30 Thread David Boreham
 Feels like I fell through a worm hole in space/time, back to inmos in 
1987, and a guy from marketing has just
walked in the office going on about there's a customer who wants to use 
our massively parallel hardware to speed up databases...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GPU Accelerated Sorting

2010-08-30 Thread David Boreham

 On 8/30/2010 3:18 PM, Chris Browne wrote:

... As long as you're willing to rewrite PostgreSQL in Occam 2...


Just re-write it in Google's new language 'Go' : it's close enough to 
Occam and they'd probably fund the project..


;)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread David Boreham


Do you guys have any more ideas to properly 'feel this disk at its 
teeth' ?


While an 'end-to-end' test using PG is fine, I think it would be easier 
to determine if the drive is behaving correctly by using a simple test 
program that emulates the storage semantics the WAL expects. Have it 
write a constant stream of records, fsync'ing after each write. Record 
the highest record number flushed so far in some place that won't be 
lost with the drive under test (e.g. send it over the network to another 
machine).


Kill the power, bring the system back up again and examine what's at the 
tail end of that file. I think this will give you the worst case test 
with the easiest result discrimination.


If you want to you could add concurrent random writes to another file 
for extra realism.


Someone here may already have a suitable test program. I know I've 
written several over the years in order to test I/O performance, prove 
the existence of kernel bugs, and so on.


I doubt it matters much how long the power is turned of. A second should 
be plenty time to flush pending writes if the drive is going to do so.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread David Boreham

On 7/8/2010 1:47 PM, Ryan Wexler wrote:
Thanks for the explanations that makes things clearer.  It still 
amazes me that it would account for a 5x change in IO.


The buffering allows decoupling of the write rate from the disk rotation 
speed.
Disks don't spin that fast, at least not relative to the speed the CPU 
is running at.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread David Boreham

On 7/8/2010 3:18 PM, timothy.noo...@emc.com wrote:

How does the linux machine know that there is a BBU installed and to
change its behavior or change the behavior of Postgres? I am
experiencing performance issues, not with searching but more with IO.
   
It doesn't change its behavior at all. It's in the business of writing 
stuff to a file and waiting until that stuff has been put on the disk 
(it wants a durable write). What the write buffer/cache does is to 
inform the OS, and hence PG, that the write has been done when in fact 
it hasn't (yet). So the change in behavior is only to the extent that 
the application doesn't spend as much time waiting.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux/PostgreSQL scalability issue - problem with 8 cores

2008-01-04 Thread David Boreham

James Mansion wrote:

Jakub Ouhrabka wrote:

How can we diagnose what is happening during the peaks?
Can you try forcing a core from a bunch of the busy processes?  (Hmm - 
does Linux have an equivalent to the useful Solaris pstacks?)
There's a 'pstack' for Linux, shipped at least in Red Hat distributions 
(and possibly others,
I'm not sure). It's a shell script wrapper around gdb, so easily ported 
to any Linux.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-20 Thread David Boreham

Tom Lane wrote:

Having malloc/free use
an internal mutex is necessary in multi-threaded programs, but the
backend isn't multi-threaded.  
  
Hmm...confused. I'm not following why then there is contention for the 
mutex.
Surely this has to be some other mutex that is in contention, not a heap 
lock ?


It'd be handy to see the call stack for the wait state -- if the thing 
is spending
a significant proportion of its time in contention it should be easy to 
get that with

a simple tool such as pstack or a debugger.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance Tuning and Disk Cache

2007-03-18 Thread David Boreham

Barry Moore wrote:

I have a very slow query that I'm trying to tune.  I think my  
performance tuning is being complicated by the system's page cache.


If a run the query after the system has been busy with other tasks  
for quite a long time then the query can take up to 8-10 minutes to  
complete.  If I then rerun the same query it will complete in a  
couple of seconds.


Does anyone know how I can repeatedly run the same query in the  
worst case scenario of no postgres data in the disk cache (e.g.,  
clear the page cache or force it to be ignored)?


In my experience the only 100% reliable way to do this is to reboot the 
machine.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-12 Thread David Boreham

Alexander Staubo wrote:

No, fsync=on. The tps values are similarly unstable with fsync=off,  
though -- I'm seeing bursts of high tps values followed by low-tps  
valleys, a kind of staccato flow indicative of a write caching being  
filled up and flushed.


Databases with checkpointing typically exhibit this cyclical throughput 
syndrome.
(put another way : this is to be expected and you are correct that it 
indicates buffered

data being flushed to disk periodically).




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Low throughput of binary inserts from windows to linux

2006-12-12 Thread David Boreham

Tom Lane wrote:


In case I was mistaken, this explanation makes perfectly sens to me.
But then again it would indicate a 'bug' in libpq, in the sense that
it (apparently) sets TCP_NODELAY on linux but not on windows.
   



No, it would mean a bug in Windows in that it fails to honor TCP_NODELAY.
 

Last time I did battle with nagle/delayed ack interaction in windows 
(the other end
has to be another stack implementation -- windows to itself I don't 
think has the problem),

it _did_ honor TCP_NODELAY. That was a while ago (1997) but I'd be surprised
if things have changed much since then.

Basically nagle has to be turned off for protocols like this 
(request/response interaction
over TCP) otherwise you'll sometimes end up with stalls waiting for the 
delayed ack
before sending, which in turn results in very low throughput, per 
connection. As I remember
Windows client talking to Solaris server had the problem, but various 
other permutations

of client and server stack implementation did not.







Re: RES: [PERFORM] Bad iostat numbers

2006-12-01 Thread David Boreham

Carlos H. Reimer wrote:


I´ve taken a look in the /var/log/messages and found some temperature
messages about the disk drives:

Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2
Celsius to 51 Celsius since last report

Can this temperature influence in the performance?
 


it can influence 'working-ness' which I guess in turn affects performance ;)

But I'm not sure if 50C is too high for a disk drive, it might be ok.

If you are able to, I'd say just replace the drives and see if that 
improves things.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Bad iostat numbers

2006-11-30 Thread David Boreham

Carlos H. Reimer wrote:

 


avg-cpu:  %user   %nice %system %iowait   %idle

  50.400.000.501.10   48.00

 

Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util


sda  0.00   7.80  0.40  6.40   41.60  113.6020.80
56.8022.82 570697.50   10.59 147.06 100.00


sdb  0.20   7.80  0.60  6.40   40.00  113.6020.00
56.8021.94 570697.509.83 142.86 100.00


md1  0.00   0.00  1.20 13.40   81.60  107.2040.80
53.6012.93 0.000.00   0.00   0.00


md0  0.00   0.00  0.00  0.000.000.00 0.00 
0.00 0.00 0.000.00   0.00   0.00


 


Are they not saturated?

 

What kind of parameters should I pay attention when comparing SCSI 
controllers and disks? I would like to discover how much cache is 
present in the controller, how can I find this value from Linux?



These number look a bit strange. I am wondering if there is a hardware 
problem on one of the drives
or on the controller. Check in syslog for messages about disk timeouts 
etc. 100% util but 6 writes/s

is just wrong (unless the drive is a 1980's vintage floppy).



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] measuring shared memory usage on Windows

2006-10-20 Thread David Boreham

Unfortunately often operating system virtual memory
and filesystem caching code that does exactly the opposite of
what a database application would like.
For some reason the kernel guys don't see it that way ;)

Over the years there have been various kernel features added
with the overall goal of solving problems in this area : O_DIRECT,
ckrm, flags to mmap() and so on. So far I'm not sure any of
them has really succeeded. Hence the real-world wisdom to
'let the filesystem cache do its thing' and configure a small
shared memory cache in the application.

Ideally one would want to use O_DIRECT or equivalent
to bypass the OS's cleverness and manage the filesystem
caching oneself. However it turns out that enabling O_DIRECT
makes things much worse not better (YMMV obviously).
It's hard to achieve the level of concurrency that the kernel
can get for disk I/O, from user mode.

Another approach is to make the application cache size
dynamic, with the goal that it can grow and shrink to
reach the size that provides the best overall performance.
I've seen attempts to drive the sizing using memory access
latency measurements done from user mode inside the application.
However I'm not sure that anyone has taken this approach
beyond the science project stage.

So AFAIK this is still a generally unsolved problem.

NT (Windows) is particularly interesting because it drives the
filesystem cache sizing with a signal that it mesures from the
VM pages evicted per second counter. In order to keep its
feedback loop stable, the OS wants to see a non-zero value
for this signal at all times. So you will see that even under
ideal conditions the system will still page a little.
(Unless that code has changed in Win2003 -- it's been a
while since I checked). So don't drive yourself crazy trying
to get it to stop paging ;)










---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] measuring shared memory usage on Windows

2006-10-16 Thread David Boreham



I learned the hard way that just rising it can lead to a hard
performance loss :)


I looked back in the list archives to try to find your post on the
underlying problem, but could only find this rather terse sentence.
If you have more detailed information please post or point me at it.

But...my first thought is that you have configured the shared memory
region so large that the system as a whole can not fit all the working set
sizes for all running processes in to physical memory. This is a common
pitfall for databases with caches implemented as mapped shared
user space regions (which is basically all databases).

For example, if you have 1G of RAM on the box, you can't
configure a cache of 900 meg and expect things to work well.
This is because the OS and associated other stuff running on
the box will use ~300megs. The system will page as a result.

The only sure fire way I know of to find the absolute maximum
cache size that can be safely configured is to experiment with
larger and larger sizes until paging occurs, then back off a bit.






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] measuring shared memory usage on Windows

2006-10-16 Thread David Boreham



really makes me think that that area is just a comfortable way to
access files on disk as memory areas; with the hope of propably better
caching then not-memory-mapped files.


No, absolutely not. CreateFileMaping() does much the same thing
as mmap() in Unix.


That would explain my disturbing impressions of performance of
PostgreSQL on win32 rising when lowering shared_memory...


I don't know what your disturbing impressions are, but no it
doesn't explain them.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance penalty for remote access of postgresql

2006-07-20 Thread David Boreham

Guoping Zhang wrote:



a)  SERVER A to SERVER B: 0.35ms
   SERVER A to itself (Local host): 0.022ms

 


0.35ms seems rather slow. You might try investigating what's in the path.
For comparison, between two machines here (three GigE switches in the
path), I see 0.10ms RTT. Between two machines on the same switch I
get 0.08ms.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Which processor runs better for Postgresql?

2006-06-13 Thread David Boreham



My suggestion is to look at something like this:

http://www.abmx.com/1u-supermicro-amd-opteron-rackmount-server-p-210.html

1U rackmount opteron from Supermicro that can have two dual core
opterons and 4 drives and up to 16 gigs of ram.  Supermicro server
motherboards have always treated me well and performed well too.
 


I've had good experience with similar machines from Tyan :
http://www.tyan.com/products/html/gt24b2891.html





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 64-bit vs 32-bit performance ... backwards?

2006-06-12 Thread David Boreham

Anthony Presley wrote:


I had an interesting discussion today w/ an Enterprise DB developer and
sales person, and was told, twice, that the 64-bit linux version of
Enterprise DB (which is based on the 64-bit version of PostgreSQL 8.1)
is SIGNIFICANTLY SLOWER than the 32-bit version.  Since the guys of EDB
are PostgreSQL . has anyone seen that the 64-bit is slower than the
32-bit version?

I was told that the added 32-bits puts a strain and extra overhead
on the processor / etc which actually slows down the pointers and
necessary back-end stuff on the database.

I'm curious if anyone can back this up  or debunk it.  It's about
the polar opposite of everything I've heard from every other database
vendor for the past several years, and would be quite an eye-opener for
me.
 

What they are saying is strictly true : 64-bit pointers tend to increase 
the working set size
of an application vs. 32-bit pointers. This means that any caches will 
have somewhat lower
hit ratio. Also the bytes/s between the CPU and memory will be higher 
due to moving those larger pointers.
In the case of a 32-bit OS this also applies to the kernel so the effect 
will be system-wide.


However, an application that needs to work on  around 2G of data will 
in the end be
much faster 64-bit due to reduced I/O (it can keep more of the data in 
memory).


I worked on porting a large database application from 32-bit to 64-bit. One
of our customers required us to retain the 32-bit version because of 
this phenomenon.


In measurements I conducted on that application, the performance 
difference wasn't

great (10% or so), but it was measurable. This was with Sun Sparc hardware.
It is possible that more modern CPU designs have more efficient 64-bit
implementation than 32-bit, so the opposite might be seen too.

Whether or not PG would show the same thing I can't say for sure. 
Probably it would though.











---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] scaling up postgres

2006-06-03 Thread David Boreham


I cannot scale beyond that value and the funny thing, is that none of the 
servers is swapping, or heavy loaded, neither postgres nor apache are

refusing connexions.
 


Hearing a story like this (throughput hits a hard limit, but
hardware doesn't appear to be 100% utilized), I'd suspect
insufficient concurrency to account for the network latency
between the two servers.

Also check that your disks aren't saturating somewhere (with
iostat or something similar).

You could run pstack against both processes and see what they're
doing while the system is under load. That might give a clue
(e.g. you might see the apache processs waiting on
a response from PG, and the PG processes waiting
on a new query to process).

Since you've proved that your test client and apache can
handle a much higher throughput, the problem must lie
somewhere else (in posgresql or the interface between
the web server and postgresql).









---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] scaling up postgres

2006-06-03 Thread David Boreham




Tom Lane wrote:

  [EMAIL PROTECTED] writes:
  
  
I'm using httperf/autobench for measurments and the best result I can get 
is that my system can handle a trafiic of almost 1600 New con/sec.

  
  
As per PFC's comment, if connections/sec is a bottleneck for you then
the answer is to use persistent connections.  Launching a new backend
is a fairly heavyweight operation in Postgres.  

I thought the OP was talking about HTTP connections/s. He didn't say if
he
was using persistent database connections or not (obviously better if
so).
If it were the case that his setup is new backend launch rate-limited,
then 
wouldn't the machine show CPU saturation ? (he said it didn't).






Re: [PERFORM] Memory and/or cache issues?

2006-05-05 Thread David Boreham





2006-05-04 18:04:58 EDT USER=postgres DB=FIX1 [12427] PORT
= [local] ERROR: invalid memory alloc request size 18446744073709551613

Perhaps I'm off beam here, but any
time I've seen an app try to allocate a gazillion bytes, it's 
due to some code incorrectly calculating the size of something (or more
commonly, using an
initialized variable as the basis for said calculation).







Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-26 Thread David Boreham




The reason AMD is has held off from supporting DDR2 until now are:
1.  DDR is EOL.  JEDEC is not ratifying any DDR faster than 200x2 while DDR2 
standards as fast as 333x4 are likely to be ratified (note that Intel pretty 
much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology. 
 Guess who has more pull with JEDEC?)

 


DDR2 is to RDRAM as C# is to Java

;)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-26 Thread David Boreham


While in general there may not be that much of a % difference between 
the 2 chips,
there's a huge gap in Postgres. For whatever reason, Postgres likes 
Opterons.

Way more than Intel P4-architecture chips.

It isn't only Postgres. I work on a number of other server applications
that also run much faster on Opterons than the published benchmark
figures would suggest they should. They're all compiled with gcc4,
so possibly there's a compiler issue. I don't run Windows on any
of our Opteron boxes so I can't easily compare using the MS compiler.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread David Boreham






  Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.
  

I've been doing plenty of performance evaluation on a parallel
application
we're developing here : on Dual Core Opterons, P4, P4D. I can say that
the Opterons open up a can of wupass on the Intel processors. Almost 2x
the performance on our application vs. what the SpecCPU numbers would 
suggest.






Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread David Boreham



My personal favorite pg platform at this time is one based on a 2 socket, dual 
core ready mainboard with 16 DIMM slots combined with dual core AMD Kx's.
 


Right. We've been buying Tyan bare-bones boxes like this.
It's better to go with bare-bones than building boxes from bare metal
because the cooling issues are addressed correctly.

Note that if you need a large number of machines, then Intel
Core Duo may give the best overall price/performance because
they're cheaper to run and cool.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Boreham

Brendan Duddridge wrote:

Thanks for your reply. So how is that different than something like  
Slony2 or pgcluster with multi-master replication? Is it similar  
technology? We're currently looking for a good clustering solution  
that will work on our Apple Xserves and Xserve RAIDs.


I think you need to be more specific about what you're trying to do.
'clustering' encompasses so many things that it means almost nothing by 
itself.


slony provides facilities for replicating data. Its primary purpose is
to improve reliability. MPP distributes both data and queries. Its
primary purpose is to improve performance for a subset of all query types.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-17 Thread David Boreham

Alan Stange wrote:


Not sure I get your point. We would want the lighter one,
all things being equal, right ? (lower shipping costs, less likely
to break when dropped on the floor)


Why would the lighter one be less likely to break when dropped on the 
floor?


They'd have less kinetic energy upon impact.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread David Boreham

Steve Wampler wrote:


Joshua D. Drake wrote:
 


The reason you want the dual core cpus is that PostgreSQL can only
execute 1 query per cpu at a time,...
   



Is that true?  I knew that PG only used one cpu per query, but how
does PG know how many CPUs there are to limit the number of queries?

 

He means only one query can be executing on each cpu at any particular 
instant.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread David Boreham

Spend a fortune on dual core CPUs and then buy crappy disks...  I bet
for most applications this system will be IO bound, and you will see a
nice lot of drive failures in the first year of operation with
consumer grade drives.

I guess I've never bought into the vendor story that there are
two reliability grades. Why would they bother making two
different kinds of bearing, motor etc ? Seems like it's more
likely an excuse to justify higher prices. In my experience the
expensive SCSI drives I own break frequently while the cheapo
desktop drives just keep chunking along (modulo certain products
that have a specific known reliability problem).

I'd expect that a larger number of hotter drives will give a less reliable
system than a smaller number of cooler ones.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread David Boreham






  
I suggest you read this on the difference between enterprise/SCSI and
desktop/IDE drives:

	http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

  

This is exactly the kind of vendor propaganda I was talking about
and it proves my point quite well : that there's nothing specific
relating
to reliability that is different between SCSI and SATA drives cited in
that paper.
It does have a bunch of FUD such as 'oh yeah we do a lot more
drive characterization during manufacturing'.







Re: [PERFORM] Postgres recovery time

2005-11-14 Thread David Boreham




Piccarello, James (James) wrote:

  
  
  Postgres recovery time

  Does anyone know what factors affect
the recovery time of postgres if it does not shutdown cleanly? With the
same size database I've seen times from a few seconds to a few
minutes. The longest time was 33 minutes. The 33 minutes was after a
complete system crash and reboot so there are a lot of other things
going on as well. 125 seconds was the longest time I could reproduce by
just doing a kill -9 on postmaster. 
  Is it the size of the transaction log?
The dead space in files? 
  

I don't know much about postgresql,
but typically WAL mechanisms
will exhibit recovery times that are bounded by the amount of log record
data written since the last checkpoint. The 'worst' case will be where 
you have continuous writes to the database and a long checkpoint
interval. In that case many log records must be replayed into the
data files upon recovery. The 'best' case would be zero write
transactions
since the last checkpoint. In that case recovery would be swift since
there are no live records to recover. In your tests you are probably
exercising this 'best' or near best case.