Re: [PERFORM] Drop Tables Very Slow in Postgresql 7.2.1

2004-03-14 Thread Andrew Sullivan
On Wed, Mar 10, 2004 at 12:33:01PM +0530, Maneesha Nunes wrote:
 Hello there !!!
 
 I  am using  postgresql7.2.1  as the backend for an E.R.P system running
 on Linux  Redhat 7.2(Enigma)

You should upgrade, to at the _very least_ the last release of 7.2. 
There were bugs in earlier releases fixed in later releases; that's
why there's a 7.2.4.  (I'll also point out that the 7.2 series is
missing plenty of performance enhancements which came later.  I'd get
to work on upgrading, because 7.2 is now basically unmaintained.)

But in any case, you likely have issues on your system tables.  I'd
do a VACUUM FULL and a complete REINDEX of the system tables next.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
--J.D. Baldwin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-14 Thread Andrew Sullivan
On Thu, Mar 11, 2004 at 09:34:54AM +0800, Christopher Kings-Lynne wrote:
 You could also consider not using syslog at all: let the postmaster
 output to its stderr, and pipe that into a log-rotation program.
 
 Not an option I'm afraid.  PostgreSQL just jams and stops logging after 
 the first rotation...

Actually, this is what we do.  Last year we offered an (admittedly
expensive) bespoke log rotator written in Perl for just this purpose. 
It was rejected on the grounds that it didn't do anything Apache's
rotator didn't do, so I didn't pursue it.  I'm willing to put it up
on gborg, though, if anyone thinks it'll be worth having around. 
FWIW, we use ours in production.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Large CASE-statement is pretty slow?

2004-03-14 Thread Arjen van der Meijden
Hi list,

I was more or less toying with an idea for a project I have, which 
includes renumbering a primary key (don't ask, it's necessary :/ )

Anyway, I was looking into the usefullness of a INSERT INTO newtable 
SELECT field, field, CASE pkey WHEN x1 THEN y1 WHEN x2 THEN y2 etc END 
FROM oldtable

The resulting select was about 1.7MB of query-text, mostly composed of 
the CASE-statement. So I discarded that idea, I still wanted to know how 
much time it would take on my database (MySQL) and found it to take 
about 1100 seconds, in contrast to simply selecting the data, which'd 
take about 0.7 seconds orso... The table I tested this on is about 30MB.

Of course I wanted to know how long it'd take on postgresql, selecting 
the pkey-field only (without the case) took also some 0.7 seconds (the 
entire table may have been more).
But the CASE-version took 9026139.201 ms, i.e. over 9000 seconds about 8 
times slower than MySQL.

What I'm wondering about:
Although I was not expecting Postgresql to heavily beat MySQL, I was 
surprised to see it so much slower. Is the CASE-statement in Postgresql 
that inefficient? Or is it simply not very scalable (i.e. don't try to 
have 10 cases like I did)?

The database is a lightly optimised gentoo-compile of 7.4.2, the 
mysql-version was 4.0.18 in case anyone wanted to know that.

Best regards,

Arjen van der Meijden

PS, don't try to help improve the query I discarded the idea as too 
inefficient and went along with a simple left join to get the new pkey 
out of a temporary table ;)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Scaling further up

2004-03-14 Thread Aaron Werman
Sorry about not chiming in before - I've been too swamped to think. I agree
with most of the points, but a lot of these posts are interesting and seem
to describe systems from an SA perspective to my DBA-centric view.

- Original Message - 
From: Marty Scholes [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, March 10, 2004 6:29 PM
Subject: Re: [PERFORM] Scaling further up


 I have some suggestions based on my anecdotal experience.

 1. This is a relatively small DB -- the working set will likely be in
 RAM at any moment in time, making read I/O time mostly irrelevant.

 2. The killer will be write times -- specifically log writes.  Small and
 heavily synchronized writes, log and data writes, will drag down an
 impressive hardware RAID setup.  We run mirrored hardware RAID 5 arrays
 with write back cache and are constantly seeking ways to improve write
 performance.  We do a lot of batch processing, though, so we do a lot of
 write I/Os.

My experience with RAID5 for streaming sequential writes is bad. This is
sometimes helped by the hardware caching to cover the cost of the additional
I/Os for striping (write through RAID5 + big cache acts like RAID 1+0 until
you run out of cache). Batch processing is different from high concurrency
transactions because it needs faster volume streaming, while TP is dependant
on the speed of ack'ing (few big writes with less synchronous waits vs. lots
of small writes which serialize everyone). (RAID 3 worked for me in the past
for logging, but I haven't used it in years.)


 3. Be very careful with battery backed write cache.  It usually works
 as advertised.  More than once in the past decade I have seen
 spontaneous cache corruption after power losss.  The corruption usually
 happens when some admin, including me, has assumed that the cache will
 ALWAYS survive a power failure unblemished and has no plan B.  Make
 sure you have a contingency plan for corruption, or don't enable the
cache.

I agree strongly. There is also the same problem with disk write back cache
and even with SCSI controllers with write through enabled. PITR would help
here. A lot of these problems are due to procedural error post crash.


 4. RAID 10 will likely have bigger stripe sizes on the RAID 0 portion of
 the setup, and might hinder, not help small write I/O performance.

In a high volume system without write caching you are almost always going to
see queuing, which can make the larger buffer mostly irrelevant, if it's not
huge. Write caching thrives on big  block sizes (which is a key reason why
Symmetrix doesn't do worse than it does) by reducing I/O counts. Most shops
I've set up or seen use mirroring or RAID 10 for logs. Note also that many
RAID 10 controllers in a non-write cached setup allows having a race between
the two writers, acknowledging when the first of the two completes -
increasing throughput by about 1/4.


 5. Most (almost all) of the I/O time will be due to the access time
 (head seek + head settle + rotational latency) and very little of the
 I/O time will due to data transfer time.  In other words, getting drives
 that provide faster transfer rates will barely improve performance.  The
 secret is lowering the access time.

True. This is very much a latency story. Even in volume batch, you can see
access time that clearly shows some other system configuration bottleneck
that happens elsewhere before hitting I/O capacity.


 6. A relatively cheap way to drastically drop the access time is to get
 large drive(s) and only use a portion of them for storage.  The less
 space used on the drive, the less area the heads need to cover for
 seeks.  At one extreme, you could make the partition the size of a
 single cylinder.  This would make access time (ignoring OS and
 controller overhead) identical to rotational latency, which is as low as
 4.2 ms for a cheap 7200 RPM drive.

This is a good strategy for VLDB, and may not be relevant in this case.

Also - big sequential writes and 15K rpm drives, in the case of
writethrough, is a beautiful thing - they look like a manufacturers' demo. A
primary performance role of a RDBMS is to convert random I/O to sequential
(by buffering reads and using a streaming log to defer random writes to
checkpoints). RDBMS's are the prime beneficiaries of the drive speed
improvements - since logging, backups, and copies are about the only things
(ignoring bad perl scripts and find commands) that generate loads of 50+
mB/sec.

/Aaron

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Deadlocks...

2004-03-14 Thread postgres
Hi,

i have 2 Functions, one ist called by page visitors (something about 2,000 times / 1 hour)
and the other just by the admin (let say 1 time per hour or per login)
i often get a deadlock error after calling the admin function
yes they try to access the same table somewhere in the function code.
The Admin function can take up to 20-30 seconds and visitor function just 20 to 30 ms
What can i do there? Have i to study a lot about locking tables or something else?

Thanks for your Help
regards,
Boris