Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Robert Lor

Arjen van der Meijden wrote:



I can already confirm very good scalability (with our workload) on 
postgresql on that machine. We've been testing a 32thread/16G-version 
and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores 
(with all four threads enabled).


The threads are a bit less scalable, but still pretty good. Enabling 
1, 2 or 4 threads for each core yields resp 60 and 130% extra 
performance.


Wow, what type of workload is it? And did you do much tuning to get 
near-linear scalability to 32 threads?


Regards,
-Robert

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-16 Thread Tatsuo Ishii
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > Interesting. We (some Japanese companies including SRA OSS,
> > Inc. Japan) did some PG scalability testing using a Unisys's big 16
> > (physical) CPU machine and found PG scales up to 8 CPUs. However
> > beyond 8 CPU PG does not scale anymore. The result can be viewed at
> > "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
> > PG has a serious lock contention problem in the environment by
> > analyzing the oprofile result.
> 
> 18% in s_lock is definitely bad :-(.  Were you able to determine which
> LWLock(s) are accounting for the contention?

Yes. We were interested in that too. Some people did addtional tests
to determin that. I don't have the report handy now. I will report
back next week.

> The test case seems to be spending a remarkable amount of time in LIKE
> comparisons, too.  That probably is not a representative condition.

I know. I think point is 18% in s_lock only appears with 12 CPUs or more.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-16 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> Interesting. We (some Japanese companies including SRA OSS,
> Inc. Japan) did some PG scalability testing using a Unisys's big 16
> (physical) CPU machine and found PG scales up to 8 CPUs. However
> beyond 8 CPU PG does not scale anymore. The result can be viewed at
> "OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
> PG has a serious lock contention problem in the environment by
> analyzing the oprofile result.

18% in s_lock is definitely bad :-(.  Were you able to determine which
LWLock(s) are accounting for the contention?

The test case seems to be spending a remarkable amount of time in LIKE
comparisons, too.  That probably is not a representative condition.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-16 Thread Tatsuo Ishii
> I am thrill to inform you all that Sun has just donated a fully loaded 
> T2000 system to the PostgreSQL community, and it's being setup by Corey 
> Shields at OSL (osuosl.org) and should be online probably early next 
> week. The system has
> 
> * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as 
> having 32 virtual CPUs, and each can be enabled or disabled individually
> * 32 GB of DDR2 SDRAM memory
> * 2 @ 73GB internal SAS drives (1 RPM)
> * 4 Gigabit ethernet ports
> 
> For complete spec, visit 
> http://www.sun.com/servers/coolthreads/t2000/specifications.jsp
> 
> I think this system is well suited for PG scalability testing, among 
> others. We did an informal test using an internal OLTP benchmark and 
> noticed that PG can scale to around 8 CPUs. Would be really cool if all 
> 32 virtual CPUs can be utilized!!!

Interesting. We (some Japanese companies including SRA OSS,
Inc. Japan) did some PG scalability testing using a Unisys's big 16
(physical) CPU machine and found PG scales up to 8 CPUs. However
beyond 8 CPU PG does not scale anymore. The result can be viewed at
"OSS iPedia" web site (http://ossipedia.ipa.go.jp). Our conclusion was
PG has a serious lock contention problem in the environment by
analyzing the oprofile result.

You can take a look at the detailed report at:
http://ossipedia.ipa.go.jp/capacity/EV0604210111/
(unfortunately only Japanese contents is available at the
moment. Please use some automatic translation services)

Evalution environment was:
PostgreSQL 8.1.2
OSDL DBT-1 2.1
Miracle Linux 4.0
Unisys ES700 Xeon 2.8GHz CPU x 16 Mem 16GB(HT off)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Josh Berkus
Arjen,

> I can already confirm very good scalability (with our workload) on
> postgresql on that machine. We've been testing a 32thread/16G-version
> and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores
> (with all four threads enabled).

Keen.   We're trying to keep the linear scaling going up to 32 cores of 
course (which doesn't happen, presently).  Would you be interested in 
helping us troubleshoot some of the performance issues?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Arjen van der Meijden

On 16-6-2006 17:18, Robert Lor wrote:


I think this system is well suited for PG scalability testing, among 
others. We did an informal test using an internal OLTP benchmark and 
noticed that PG can scale to around 8 CPUs. Would be really cool if all 
32 virtual CPUs can be utilized!!!


I can already confirm very good scalability (with our workload) on 
postgresql on that machine. We've been testing a 32thread/16G-version 
and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores 
(with all four threads enabled).


The threads are a bit less scalable, but still pretty good. Enabling 1, 
2 or 4 threads for each core yields resp 60 and 130% extra performance.


Best regards,

Arjen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Question about clustering multiple columns

2006-06-16 Thread Bruno Wolff III
On Fri, Jun 16, 2006 at 11:11:59 -0700,
  Benjamin Arai <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> Thanks for the reply. I have one more question.  Does it matter in which
> order that I make the index?

Please keep replies copied to the lists so that other people can learn from
and crontibute to the discussion.

In this case I am just going to copy back to the performance list, since it
is generally better for perfomance questions than the general list.

> For example, should I create an index cusip,date or date,cusip, does it
> matter which order.  My goal is to cluster the entries by cusip, then for
> each cusip order the data by date (maybe the order by data occurs
> automatically).  Hm, in that case maybe I only need to cluster by cusip, but
> then how do I ensure that each cusip had its data ordered by date?

I think that you want to order by cusip (assuming that corresponds to "name"
in you sample query below) first. You won't end up having to go through values
in the index that will be filtered out if you do it that way.

The documentation for the cluster command says that it clusters on indexes,
not columns. So if the index is on (cusip, date), then the records will be
ordered by cusip, date immediately after the cluster. (New records added 
after the cluster are not guarenteed to be ordered by the index.)

> 
> Benjamin
> 
> -Original Message-
> From: Bruno Wolff III [mailto:[EMAIL PROTECTED] 
> Sent: Friday, June 16, 2006 8:32 AM
> To: Benjamin Arai
> Cc: pgsql-general@postgresql.org; pgsql-performance@postgresql.org
> Subject: Re: Question about clustering multiple columns
> 
> On Tue, Jun 13, 2006 at 09:04:15 -0700,
>   Benjamin Arai <[EMAIL PROTECTED]> wrote:
> > Hi,
> >  
> > I have a database where there are three columns (name,date,data).  The 
> > queries are almost always something like SELECT date,data FROM table 
> > WHERE name=blah AND date > 1/1/2005 AND date < 1/1/2006;.  I currently 
> > have three B-tree indexes, one for each of the columns.  Is clustering 
> > on date index going to be what I want, or do I need a index that 
> > contains both name and date?
> 
> I would expect that clustering on the name would be better for the above
> query.
> You probably want an index on name and date combined.
> 
> !DSPAM:4492ce0d180368658827628!
> 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Jeff Trout


On Jun 16, 2006, at 5:11 AM, Tim Allen wrote:


One curious thing is that some postgres backends seem to spend an  
inordinate amount of time in uninterruptible iowait state. I found  
a posting to this list from December 2004 from someone who reported  
that very same thing. For example, bringing down postgres on the  
customer box requires kill -9, because there are invariably one or  
two processes so deeply uninterruptible as to not respond to a  
politer signal. That indicates something not quite right, doesn't it?




Sounds like there could be a driver/array/kernel bug there that is  
kicking the performance down the tube.

If it was PG's fault it wouldn't be stuck uninterruptable.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.dellsmartexitin.com/
http://www.stuarthamm.net/




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Josh Berkus
Folks,

> I am thrill to inform you all that Sun has just donated a fully loaded
> T2000 system to the PostgreSQL community, and it's being setup by Corey
> Shields at OSL (osuosl.org) and should be online probably early next
> week. The system has

So this system will be hosted by Open Source Lab in Oregon.  It's going to 
be "donated" to Software In the Public Interest, who will own for the 
PostgreSQL fund.

We'll want to figure out a scheduling system to schedule performance and 
compatibility testing on this machine; I'm not sure exactly how that will 
work.  Suggestions welcome.  As a warning, Gavin Sherry and I have a bunch 
of pending tests already to run.

First thing as soon as I have a login, of course, is to set up a Buildfarm 
instance.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Question about clustering multiple columns

2006-06-16 Thread Bruno Wolff III
On Tue, Jun 13, 2006 at 09:04:15 -0700,
  Benjamin Arai <[EMAIL PROTECTED]> wrote:
> Hi,
>  
> I have a database where there are three columns (name,date,data).  The
> queries are almost always something like SELECT date,data FROM table WHERE
> name=blah AND date > 1/1/2005 AND date < 1/1/2006;.  I currently have three
> B-tree indexes, one for each of the columns.  Is clustering on date index
> going to be what I want, or do I need a index that contains both name and
> date?

I would expect that clustering on the name would be better for the above
query.
You probably want an index on name and date combined.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Robert Lor


I am thrill to inform you all that Sun has just donated a fully loaded 
T2000 system to the PostgreSQL community, and it's being setup by Corey 
Shields at OSL (osuosl.org) and should be online probably early next 
week. The system has


* 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as 
having 32 virtual CPUs, and each can be enabled or disabled individually

* 32 GB of DDR2 SDRAM memory
* 2 @ 73GB internal SAS drives (1 RPM)
* 4 Gigabit ethernet ports

For complete spec, visit 
http://www.sun.com/servers/coolthreads/t2000/specifications.jsp


I think this system is well suited for PG scalability testing, among 
others. We did an informal test using an internal OLTP benchmark and 
noticed that PG can scale to around 8 CPUs. Would be really cool if all 
32 virtual CPUs can be utilized!!!


Anyways, if you need to access the system for testing purposes, please 
contact Josh Berkus.


Regards,

Robert Lor
Sun Microsystems, Inc.
01-510-574-7189




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Merlin Moncure

On 6/16/06, Mikael Carneholm <[EMAIL PROTECTED]> wrote:

We've seen similar results with our EMC CX200 (fully equipped) when
compared to a single (1) SCSI disk machine. For sequential reads/writes
(import, export, updates on 5-10 30M+ row tables), performance is
downright awful. A big DB update took 5-6h in pre-prod (single SCSI),
and 10-14?h (don't recall the exact details) in production (EMC SAN).
And this was with a proprietary DB, btw - no fsync on/off affecting the
results here.


You are in good company.  We bought a Hitachi AMS200, 2gb FC and a
gigabyte of cache.  We were shocked and dismayed to find the unit
could do about 50 mb/sec measured from dd (yes, around the performance
of a single consumer grade sata drive).   It is my (unconfirmted)
belief that the unit was governed internally to encourage you to buy
the more expensive version, AMS500, etc.

needless to say, we sent the unit back, and are now waiting on a
xyratex 4gb FC attached SAS unit.  we spoke directly to their
performance people who told us to expect the unit to be network
bandwitdh bottlenecked as you would expect.  they were even talking
about a special mode where you could bond the dual fc ports, now
that's power.  If the unit really does what they claim, I will be back
here talking about it for sure ;)

The bottom line is that most SANs, even from some of the biggest
vendors, are simply worthless from a performance angle.  You have to
be really critical when you buy them, don't beleive anything the sales
rep tells you, and make sure to negotiate in advance a return policy
if the unit does not perform.  There is tons of b.s. out there, but so
far my impression of xyratex is really favorable (fingers crossed),
and I'm hearing lots of great stuff about them from the channel.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Optimizer internals

2006-06-16 Thread Jonah H. Harris

On 16 Jun 2006 09:21:01 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:

Well Oracle has to do almost all that same work, it's just doing it in a
separate place called a rollback segment.


Well, it's not really the same work.  The process by which Oracle
manages UNDO is actually pretty simple and efficient, but complex in
its implementation.  There has also been some significant performance
improvements in this area in both 9i and 10g.


There are pros and cons especially where it comes
to indexes, but also where it comes to what happens
when the new record is larger than the old one.


Certainly, you want to avoid row chaining at all costs; which is why
PCTFREE is there.  I have researched update-in-place for PostgreSQL
and can avoid row-chaining... so I think we can get the same benefit
without the management and administration cost.


IMHO the biggest problem Postgres has is when you're
updating a lot of records in a table with little free space.


Yes, this is certainly the most noticible case.  This is one reason
I'm behind the freespace patch.  Unfortunately, a lot of inexperienced
people use VACUUM FULL and don't understand why VACUUM is *generally*
better.(to free up block-level freespace and update FSM) assuming they
have enough hard disk space for the database.


That and of course the visibility bitmap that has been
much-discussed


I'd certainly like to see it.


I wouldnt' say the benchmarks are flawed but I also
don't think you can point to any specific design
feature and say it's essential just on the basis of
bottom-line results. You have to look at the actual
benefit the specific wins.


True.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Delete operation VERY slow...

2006-06-16 Thread Tom Lane
David Leangen <[EMAIL PROTECTED]> writes:
> The only inconvenience is that I need to remove all the foreign key
> constraints before truncating, then put them back after.

I was about to ask if you had any.  Usually the reason for DELETE being
slow is that you have foreign key references to (not from) the table and
the referencing columns aren't indexed.  This forces a seqscan search
of the referencing table for each row deleted :-(

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizer internals

2006-06-16 Thread Greg Stark
"Jonah H. Harris" <[EMAIL PROTECTED]> writes:

> Now, if we're considering UPDATES (the worst case for PostgreSQL's
> current MVCC architecture), then this is (IMHO) a true statement.
> There aren't many *successful* commercial databases that incur the
> additional overhead of creating another version of the record, marking
> the old one as having been updated, inserting N-number of new index
> entries to point to said record, and having to WAL-log all
> aforementioned changes.  

Well Oracle has to do almost all that same work, it's just doing it in a
separate place called a rollback segment. There are pros and cons especially
where it comes to indexes, but also where it comes to what happens when the
new record is larger than the old one.

> I've done a good amount of research on enhancing PostgreSQL's MVCC in UPDATE
> conditions and believe there is a nice happy medium for us.

IMHO the biggest problem Postgres has is when you're updating a lot of records
in a table with little free space. Postgres has to keep jumping back and forth
between the old records it's reading in and the new records it's writing out.
That can in theory turn a simple linear update scan into a O(n^2) operation.
In practice read-ahead and caching should help but I'm not clear to what
extent.

That and of course the visibility bitmap that has been much-discussed that
might make vacuum not have to visit every page and allow index scans to skip
checking visibility info for some pages would be major wins.

> /me waits for the obligatory and predictable, "the benchmarks are
> flawed" response.

I wouldnt' say the benchmarks are flawed but I also don't think you can point
to any specific design feature and say it's essential just on the basis of
bottom-line results. You have to look at the actual benefit the specific wins.

Oracle and the others all implement tons of features intended to optimize
applications like the benchmarks (and the benchmarks specifically of course:)
that have huge effects on the results. Partitioned tables, materialized views,
etc allow algorithmic improvements that do much more than any low level
optimizations can do.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Optimizer internals

2006-06-16 Thread Jonah H. Harris

On 16 Jun 2006 07:23:26 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:

The flip side is that Oracle and others like it have to
do a lot of extra footwork to do if you query data
that hasn't been committed yet. That footwork
has performance implications.


Not disagreeing here at all, but considering that Oracle, DB2, and SQL
Server, et al have proven themselves to perform extremely well under
heavy load (in multiple benchmarks), the overhead of an UNDO
implementation has a calculable break even point.

Feel free to debate it, but the optimistic approach adopted by nearly
every commercial database vendor is *generally* a better approach for
OLTP.

Consider Weikum & Vossen (p. 442):

We also need to consider the extra work that the recovery algorithm
incurs during normal operation.  This is exactly the catch with the
class of no-undo/no-redo algorithms.  By and large, they come at the
expense of a substantial overhead during normal operations that may
increase the execution cost per transaction by a factor of two or even
higher.  In other words, it reduces the achievable transaction
throughput of a given server configuration by a factor of two or more.

Now, if we're considering UPDATES (the worst case for PostgreSQL's
current MVCC architecture), then this is (IMHO) a true statement.
There aren't many *successful* commercial databases that incur the
additional overhead of creating another version of the record, marking
the old one as having been updated, inserting N-number of new index
entries to point to said record, and having to WAL-log all
aforementioned changes.  I have yet to see any successful commercial
RDBMS using some sort of no-undo algorithm that doesn't follow the,
"factor of two or more" performance reduction.  However, if you
consider an INSERT or DELETE in PostgreSQL, those are implemented much
better than in most commercial database systems due to PostgreSQL's
MVCC design.  I've done a good amount of research on enhancing
PostgreSQL's MVCC in UPDATE conditions and believe there is a nice
happy medium for us.

/me waits for the obligatory and predictable, "the benchmarks are
flawed" response.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] how to partition disks

2006-06-16 Thread Michael Stone

On Wed, Jun 14, 2006 at 04:32:23PM +0200, Sven Geisler wrote:
For example, You run two queries with two clients and each queries needs 
to read some indices from disk. In this case it more efficient to read 
from different volumes than to read from one large volume where the disc 
arms has to jump.


Hmm. Bad example, IMO. In the case of reading indices you're doing 
random IO and the heads will be jumping all over the place anyway. The 
limiting factor there will be seeks/s, and you'll possibly get better 
results with the larger array. (That case is fairly complicated to 
analyze and depends very much on the data.) Where multiple arrays will be 
faster is if you have a lot of sequential IO--in fact, a couple of cheap 
disks can blow away a fairly expensive array for purely sequential 
operations since each disk can handle >60MB/s of if it doesn't have to 
seek, whereas multiple sequential streams on the big array will cause 
each disk in the array to seek. (The array controller will try to hide 
this with its cache; its cache size & software will determine how 
successful it is at doing so.)


Mike Stone

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Mikael Carneholm
We've seen similar results with our EMC CX200 (fully equipped) when
compared to a single (1) SCSI disk machine. For sequential reads/writes
(import, export, updates on 5-10 30M+ row tables), performance is
downright awful. A big DB update took 5-6h in pre-prod (single SCSI),
and 10-14?h (don't recall the exact details) in production (EMC SAN).
And this was with a proprietary DB, btw - no fsync on/off affecting the
results here.

FC isn't exactly known for great bandwidth, iirc a 2Gbit FC channel tops
at 192Mb/s. So, especially if you mostly have DW/BI type of workloads,
go for DAD (Direct Attached Disks) instead.

/Mikael

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Allen
Sent: den 15 juni 2006 23:50
To: pgsql-performance@postgresql.org
Subject: [PERFORM] SAN performance mystery

We have a customer who are having performance problems. They have a
large (36G+) postgres 8.1.3 database installed on an 8-way opteron with
8G RAM, attached to an EMC SAN via fibre-channel (I don't have details
of the EMC SAN model, or the type of fibre-channel card at the moment). 
They're running RedHat ES3 (which means a 2.4.something Linux kernel).

They are unhappy about their query performance. We've been doing various
things to try to work out what we can do. One thing that has been
apparent is that autovacuum has not been able to keep the database
sufficiently tamed. A pg_dump/pg_restore cycle reduced the total
database size from 81G to 36G. Performing the restore took about 23
hours.

We tried restoring the pg_dump output to one of our machines, a
dual-core pentium D with a single SATA disk, no raid, I forget how much
RAM but definitely much less than 8G. The restore took five hours. So it
would seem that our machine, which on paper should be far less
impressive than the customer's box, does more than four times the I/O
performance.

To simplify greatly - single local SATA disk beats EMC SAN by factor of
four.

Is that expected performance, anyone? It doesn't sound right to me. Does
anyone have any clues about what might be going on? Buggy kernel
drivers? Buggy kernel, come to think of it? Does a SAN just not provide
adequate performance for a large database?

I'd be grateful for any clues anyone can offer,

Tim




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Greg Stark

"Alex Turner" <[EMAIL PROTECTED]> writes:

> Given the fact that most SATA drives have only an 8MB cache, and your RAID
> controller should have at least 64MB, I would argue that the system with the
> RAID controller should always be faster.  If it's not, you're getting
> short-changed somewhere, which is typical on linux, because the drivers just
> aren't there for a great many controllers that are out there.

Alternatively Linux is using the 1-4 gigabytes of cache available to it
effectively enough that the 64 megabytes of mostly duplicated cache just isn't
especially helpful...

I never understood why disk caches on the order of megabytes are exciting. Why
should disk manufacturers be any better about cache management than OS
authors?

In the case of RAID 5 this could actually work against you since the RAID
controller can _only_ use its cache to find parity blocks when writing.
Software raid can use all of the OS's disk cache to that end.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizer internals

2006-06-16 Thread Greg Stark
Mark Lewis <[EMAIL PROTECTED]> writes:

> On Thu, 2006-06-15 at 14:05 -0400, John Vincent wrote:
> > Now I've been told by our DBA that we should have been able to wholy
> > satisfy that query via the indexes.
> 
> DB2 can satisfy the query using only indexes because DB2 doesn't do
> MVCC.

Well it's more subtle than that. DB2 most certainly does provide MVCC
semantics as does Oracle and MSSQL and any other serious SQL implementation.

But there are different ways to implement MVCC and every database makes
decisions that have pros and cons. Postgres's implementation has some big
benefits over others (no rollback segments, no expensive recovery operations,
fast inserts and updates) but it also has disadvantages (periodic vacuums and
indexes don't cover the data).

The distinction you're looking for here is sometimes called "optimistic"
versus "pessimistic" space management. (Not locking, that's something else.)
Postgres is "pessimistic" -- treats every transaction as if it might be rolled
back. Oracle and most others are "optimistic" assumes every transaction will
be committed and stores information elsewhere to implement MVCC And recover in
case it's rolled back. The flip side is that Oracle and others like it have to
do a lot of extra footwork to do if you query data that hasn't been committed
yet. That footwork has performance implications.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Delete operation VERY slow...

2006-06-16 Thread PFC



Wow! That was almost instantaneous. I can't believe the difference.

The only inconvenience is that I need to remove all the foreign key
constraints before truncating, then put them back after. But I suppose
it is a small price to pay for this incredible optimization.


	In that case, your DELETE might have been slowed down by foreign key  
checks.


	Suppose you have tables A and B, and table A has a column "b_id  
REFERENCES B(id)"
	When you delete from B postgres has to lookup in A which rows reference  
the deleted rows in order to do the ON DELETE action you specified in the  
constraint.
	If you do not have an index on b_id, this can be quite slow... so you  
should check if your foreign key relations that need indexes have them.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Tim Allen

Tim Allen wrote:
We have a customer who are having performance problems. They have a 
large (36G+) postgres 8.1.3 database installed on an 8-way opteron with 
8G RAM, attached to an EMC SAN via fibre-channel (I don't have details 
of the EMC SAN model, or the type of fibre-channel card at the moment). 
They're running RedHat ES3 (which means a 2.4.something Linux kernel).


To simplify greatly - single local SATA disk beats EMC SAN by factor of 
four.


Is that expected performance, anyone? It doesn't sound right to me. Does 
anyone have any clues about what might be going on? Buggy kernel 
drivers? Buggy kernel, come to think of it? Does a SAN just not provide 
adequate performance for a large database?


I'd be grateful for any clues anyone can offer,

Tim


Thanks to all who have replied so far. I've learned a few new things in 
the meantime.


Firstly, the fibrechannel card is an Emulex LP1050. The customer seems 
to have rather old drivers for it, so I have recommended that they 
upgrade asap. I've also suggested they might like to upgrade their 
kernel to something recent too (eg upgrade to RHEL4), but no telling 
whether they'll accept that recommendation.


The fact that SATA drives are wont to lie about write completion, which 
several posters have pointed out, presumably has an effect on write 
performance (ie apparent write performance is increased at the cost of 
an increased risk of data-loss), but, again presumably, not much of an 
effect on read performance. After loading the customer's database on our 
fairly modest box with the single SATA disk, we also tested select query 
performance, and while we didn't see a factor of four gain, we certainly 
saw that read performance is also substantially better. So the fsync 
issue possibly accounts for part of our factor-of-four, but not all of 
it. Ie, the SAN is still not doing well by comparison, even allowing for 
the presumption that it is more honest.


One curious thing is that some postgres backends seem to spend an 
inordinate amount of time in uninterruptible iowait state. I found a 
posting to this list from December 2004 from someone who reported that 
very same thing. For example, bringing down postgres on the customer box 
requires kill -9, because there are invariably one or two processes so 
deeply uninterruptible as to not respond to a politer signal. That 
indicates something not quite right, doesn't it?


Tim

--
---
Tim Allen  [EMAIL PROTECTED]
Proximity Pty Ltd  http://www.proximity.com.au/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Delete operation VERY slow...

2006-06-16 Thread David Leangen

Wow! That was almost instantaneous. I can't believe the difference.

The only inconvenience is that I need to remove all the foreign key
constraints before truncating, then put them back after. But I suppose
it is a small price to pay for this incredible optimization.


Thank you!



On Fri, 2006-06-16 at 12:52 +0530, Gourish Singbal wrote:
>  
> David,
>  
> Truncate table would be a good idea if u want to delete all the data
> in the table.
> You need not perform vacuum in this case since there are no dead rows
> created.
>  
> ~gourish
> 
>  
> On 6/16/06, David Leangen <[EMAIL PROTECTED]> wrote: 
> 
> Hello!
> 
> I am trying to delete an entire table. There are about 41,000
> rows in
> the table (based on count(*)). 
> 
> I am using the SQL comment: delete from table;
> 
> The operation seems to take in the order of hours, rather than
> seconds
> or minutes.
> 
> "Explain delete from table" gives me:
> 
>   QUERY PLAN 
> 
> Seq Scan on table  (cost=0.00..3967.74 rows=115374 width=6)
> (1 row)
> 
> 
> I am using an Intel Pentium D 2.8GHz CPU. My system has about
> 1.2GB of
> RAM. This should be ok... my database isn't that big, I think.
> 
> 
> Any ideas why this takes so long and how I could speed this
> up?
> 
> Or alternatively, is there a better way to delete all the
> contents from 
> a table?
> 
> 
> Thank you!
> 
> 
> 
> ---(end of
> broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 
> 
> 
> -- 
> Best,
> Gourish Singbal 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] SAN performance mystery

2006-06-16 Thread Stefan Kaltenbrunner
Tim Allen wrote:
> We have a customer who are having performance problems. They have a
> large (36G+) postgres 8.1.3 database installed on an 8-way opteron with
> 8G RAM, attached to an EMC SAN via fibre-channel (I don't have details
> of the EMC SAN model, or the type of fibre-channel card at the moment).
> They're running RedHat ES3 (which means a 2.4.something Linux kernel).
> 
> They are unhappy about their query performance. We've been doing various
> things to try to work out what we can do. One thing that has been
> apparent is that autovacuum has not been able to keep the database
> sufficiently tamed. A pg_dump/pg_restore cycle reduced the total
> database size from 81G to 36G. Performing the restore took about 23 hours.

Hi Tim!

to give you some comparision - we have a similiar sized database here
(~38GB after a fresh restore and ~76GB after some months into
production). the server is a 4 core Opteron @2,4Ghz with 16GB RAM,
connected via 2 QLogic 2Gbit HBA's to the SAN (IBM DS4300 Turbo).

It took us quite a while to get this combination up to speed but a full
dump&restore cycle (via a pg_dump | psql pipe over the net) now takes
only about an hour.
23 hours or even 5 hours sounds really excessive - I'm wondering about
some basic issues with the SAN.
If you are using any kind of multipathing (most likely the one in the
QLA-drivers) I would at first assume that you are playing ping-pong
between the controllers (ie the FC-cards do send IO to more than one
SAN-head causing those to failover constantly completely destroying
performance).
ES3 is rather old too and I don't think that even their hacked up kernel
is very good at driving a large Opteron SMP box (2.6 should be MUCH
better in that regard).

Other than that - how well is your postgresql instance tuned to your
hardware ?


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Delete operation VERY slow...

2006-06-16 Thread Gourish Singbal
 
David,
 
Truncate table would be a good idea if u want to delete all the data in the table.
You need not perform vacuum in this case since there are no dead rows created.
 
~gourish 
On 6/16/06, David Leangen <[EMAIL PROTECTED]> wrote:
Hello!I am trying to delete an entire table. There are about 41,000 rows inthe table (based on count(*)).
I am using the SQL comment: delete from table;The operation seems to take in the order of hours, rather than secondsor minutes."Explain delete from table" gives me:  QUERY PLAN
Seq Scan on table  (cost=0.00..3967.74 rows=115374 width=6)(1 row)I am using an Intel Pentium D 2.8GHz CPU. My system has about 1.2GB
 ofRAM. This should be ok... my database isn't that big, I think.Any ideas why this takes so long and how I could speed this up?Or alternatively, is there a better way to delete all the contents from
a table?Thank you!---(end of broadcast)---TIP 5: don't forget to increase your free space map settings
-- Best,Gourish Singbal 


Re: [PERFORM] Delete operation VERY slow...

2006-06-16 Thread A. Kretschmer
am  16.06.2006, um 15:58:46 +0900 mailte David Leangen folgendes:
> 
> Hello!
> 
> I am trying to delete an entire table. There are about 41,000 rows in
> the table (based on count(*)).
> 
> I am using the SQL comment: delete from table;

Use TRUNCATE table.


Andreas
-- 
Andreas Kretschmer(Kontakt: siehe Header)
Heynitz:  035242/47215,  D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===Schollglas Unternehmensgruppe=== 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Delete operation VERY slow...

2006-06-16 Thread David Leangen

Hello!

I am trying to delete an entire table. There are about 41,000 rows in
the table (based on count(*)).

I am using the SQL comment: delete from table;

The operation seems to take in the order of hours, rather than seconds
or minutes.

"Explain delete from table" gives me:

   QUERY PLAN

 Seq Scan on table  (cost=0.00..3967.74 rows=115374 width=6)
(1 row)


I am using an Intel Pentium D 2.8GHz CPU. My system has about 1.2GB of
RAM. This should be ok... my database isn't that big, I think.


Any ideas why this takes so long and how I could speed this up?

Or alternatively, is there a better way to delete all the contents from
a table?


Thank you!



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings