Re: [PERFORM] ECC RAM really needed?

2007-05-27 Thread Andrew Sullivan
On Sat, May 26, 2007 at 10:52:14AM -0400, [EMAIL PROTECTED] wrote:
 Do you want to be the one person who does notice a problem? :-)

Right, and notice that when you notice the problem _may not_ be when
it happens.  The problem with errors in memory (or on disk
controllers, another place not to skimp in your hardware budget for
database machines) is that the unnoticed failure could well write
corrupted data out.  It's some time later that you notice you have
the problem, when you go to look at the data and discover you have
garbage.

If your data is worth storing, it's worth storing correctly, and so
doing things to improve the chances of correct storage is a good
idea.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Everything that happens in the world happens at some place.
--Jane Jacobs 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread Jim C. Nasby
On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
   This does not run a complete sort on the table. It would be about as 
   fast  as your seq scan disk throughput. Obviously, the end result is 
 not as 
 good  as a real CLUSTER since the table will be made up of several ordered  
 chunks and a range lookup. Therefore, a range lookup on the clustered  
 columns would need at most N seeks, versus 1 for a really clustered table.  
 But it only scans the table once and writes it once, even counting index  
 rebuild.

Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpxnzY69XnoC.pgp
Description: PGP signature


Re: [PERFORM] Domains versus Check Constraints

2007-05-27 Thread Jim C. Nasby
On Tue, May 22, 2007 at 12:56:21PM -0400, Chander Ganesan wrote:
 Are there any performance improvements that come from using a domain 
 over a check constraint (aside from the ease of management component)?

No. Plus support for domain constraints isn't universal (plpgsql doesn't
honor them, for example).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpHXSEWtrc1J.pgp
Description: PGP signature


Re: [PERFORM] Simulate database fragmentation

2007-05-27 Thread Jim C. Nasby
On Wed, May 23, 2007 at 11:58:06AM -0700, Y Sidhu wrote:
 Is there any easy way to take a database and add/delete records to create
 fragmentation of the records and indexes. I am trying to recreate high
 vacuum times.

Update random rows, then do a vacuum. That will result in free space in
random locations. At that point you'd probably want to update some
ranges of rows, enough so that they get forced to new pages.

A better idea might be to just insert random data.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpVs5CIluK1t.pgp
Description: PGP signature


Re: [PERFORM] Memory allocation and Vacuum abends

2007-05-27 Thread Jim C. Nasby
What does top report as using the most memory?

On Wed, May 23, 2007 at 11:01:24PM -0300, Leandro Guimar?es dos Santos wrote:
 Hi all,
 
  
 
 I have a 4 CPU, 4GB Ram memory box running PostgreSql 8.2.3 under Win 2003 in 
 a very high IO intensive insert application.
 
  
 
 The application inserts about 570 rows per minute or 9 rows per second.
 
  
 
 We have been facing some memory problem that we cannot understand.
 
  
 
 From time to time memory allocation goes high and even after we stop 
 postgresql service the memory continues allocated and if were restart the 
 service the Postgres crash over.
 
  
 
 It's a 5 GB database size already that was born 1 and a half month ago. We 
 have 2 principal tables partitioned.
 
  
 
 Above is the log file. Do anyone have any idea what could the problem be..
 
  
 
 Thanks in advance.
 
  
 
  
 
 2007-05-23 13:21:00 LOG:  CreateProcess call failed: A blocking operation was 
 interrupted by a call to WSACancelBlockingCall.
 
  (error code 1450)
 
 2007-05-23 13:21:00 LOG:  could not fork new process for connection: A 
 blocking operation was interrupted by a call to WSACancelBlockingCall.
 
 
 
 2007-05-23 13:21:06 LOG:  could not receive data from client: An operation on 
 a socket could not be performed because the system lacked sufficient buffer 
 space or because a queue was full.
 
 2007-05-23 13:21:17 LOG:  server process (PID 256868) exited with exit code 
 128
 
 2007-05-23 13:21:17 LOG:  terminating any other active server processes
 
 2007-05-23 13:21:17 WARNING:  terminating connection because of crash of 
 another server process
 
 2007-05-23 13:21:17 DETAIL:  The postmaster has commanded this server process 
 to roll back the current transaction and exit, because another server process 
 exited abnormally and possibly corrupted shared memory.
 
 2007-05-23 13:21:17 HINT:  In a moment you should be able to reconnect to the 
 database and repeat your command.
 
 2007-05-23 13:21:17 WARNING:  terminating connection because of crash of 
 another server process
 
 2007-05-23 13:21:17 DETAIL:  The postmaster has commanded this server process 
 to roll back the current transaction and exit, because another server process 
 exited abnormally and possibly corrupted shared memory.
 
 2007-05-23 13:21:17 WARNING:  terminating connection because of crash of 
 another server process
 
 2007-05-23 13:21:17 DETAIL:  The postmaster has commanded this server process 
 to roll back the current transaction and exit, because another server process 
 exited abnormally and possibly corrupted shared memory.
 
  
 

-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp0D3edkaqiH.pgp
Description: PGP signature


Re: [PERFORM] Performance Problem with Vacuum of bytea table (PG 8.0.13)

2007-05-27 Thread Jim C. Nasby
On Fri, May 25, 2007 at 10:29:30AM +0200, Bastian Voigt wrote:
 Hi *,
 for caching large autogenerated XML files, I have created a bytea table 
 in my database so that the cached files can be used by multiple servers. 
 There are about 500 rows and 10-20 Updates per minute on the table. The 
 files stored in the bytea are anything from 10kB to 10MB. My PostgreSQL 
 version is 8.0.13 on Gentoo Linux (x86) with PostGIS 1.2.0.
 
 For vacuum I use the pg_autovacuum daemon. It decided to vacuum my cache 
 table about every 3 hours, the vacuum process takes 20-30 minutes 
 (oops!) every time.

You'll want to decrease autovacum_vacuum_scale_factor to 0.2 if you're
on anything less than 8.2.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpmVh9hXWqRK.pgp
Description: PGP signature


Re: [PERFORM] Domains versus Check Constraints

2007-05-27 Thread Stefan Kaltenbrunner
Jim C. Nasby wrote:
 On Tue, May 22, 2007 at 12:56:21PM -0400, Chander Ganesan wrote:
 Are there any performance improvements that come from using a domain 
 over a check constraint (aside from the ease of management component)?
 
 No. Plus support for domain constraints isn't universal (plpgsql doesn't
 honor them, for example).

since 8.2 domain constraints are enforced everywhere ...


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread PFC
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby [EMAIL PROTECTED]  
wrote:



On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:

This does not run a complete sort on the table. It would be about as
	fast  as your seq scan disk throughput. Obviously, the end result is  
not as
good  as a real CLUSTER since the table will be made up of several  
ordered

chunks and a range lookup. Therefore, a range lookup on the clustered
columns would need at most N seeks, versus 1 for a really clustered  
table.

But it only scans the table once and writes it once, even counting index
rebuild.


Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?


	While the little benchmark that will answer your question is running,  
I'll add a few comments :


	I have been creating a new benchmark for PostgreSQL and MySQL, that I  
will call the Forum Benchmark. It mimics the activity of a forum.
	So far, I have got interesting results about Postgres and InnoDB and will  
publish an extensive report with lots of nasty stuff in it, in, say, 2  
weeks, since I'm doing this in spare time.


	Anyway, forums like clustered tables, specifically clusteriing posts on  
(topic_id, post_id), in order to be able to display a page with one disk  
seek, instead of one seek per post.
	PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster  
since I run it on dual core ; InnoDB uses only one core). However, InnoDB  
can automatically cluster tables without maintenance. This means InnoDB  
will, even though it sucks and is awfully bloated, run a lot faster than  
postgres if things become IO-bound, ie. if the dataset is larger than RAM.
	Postgres needs to cluster the posts table in order to keep going. CLUSTER  
is very slow. I tried inserting into a new posts table, ordering by  
(post_id, topic_id), then renaming the new table in place of the old. It  
is faster, but still slow when handling lots of data.
	I am trying other approaches, some quite hack-ish, and will report my  
findings.


Regards




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread Alexander Staubo

On 5/27/07, PFC [EMAIL PROTECTED] wrote:

PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance.


How does it know what to cluster by? Does it gather statistics about
query patterns on which it can decide an optimal clustering, or does
it merely follow a clustering previously set up by the user?

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread PFC




How does it know what to cluster by? Does it gather statistics about
query patterns on which it can decide an optimal clustering, or does
it merely follow a clustering previously set up by the user?


	Nothing fancy, InnoDB ALWAYS clusters on the primary key, whatever it is.  
So, if you can hack your stuff into having a primary key that clusters  
nicely, good for you. If not, well...
	So, I used (topic_id, post_id) as the PK, even though it isn't the real  
PK (this should be post_id)...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Postgres Benchmark Results

2007-05-27 Thread Chris


I am re-running it with other tuning, notably cost-based vacuum 
delay and less frequent checkpoints, and it is a *lot* smoother.
These take a full night to run, so I'll post more results when I 
have usefull stuff to show.

This has proven to be a very interesting trip to benchmarkland...


[ rather late in my reply but I had to ]

Are you tuning mysql in a similar fashion ?

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match