[PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Mark Stosberg


Hello,

PostgreSQL has served us very well powering a busy national pet
adoption website. Now I'd like to tune our setup further get more out
of hardware. 

What I'm noticing is that the while the FreeBSD server has 4 Gigs of
memory, there are rarely every more than 2 in use-- the memory use
graphs as being rather constant. My goal is to make good use of those 2
Gigs of memory to improve performance and reduce the CPU usage. 

The server has 4 2.33 Ghz processors in it, and RAIDed 15k RPM SCSI
disks.

Here are some current memory-related settings from our postgresql.conf
file. (We currently run 8.2, but are planning an upgrade to 8.4
soon). Do you see an obvious suggestions for improvement? 

I find the file a bit hard to read because of the lack of units in 
the examples, but perhaps that's already been addressed in future
versions.

 max_connections= 400 # Seems to be enough us
 shared_buffers = 8192
 effective_cache_size   = 1000
 work_mem   = 4096
 maintenance_work_mem   = 160MB  

Thanks for your suggestions!

   Mark

[I tried to post this yesterday but didn't see it come through. This
message is a second attempt.)

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   m...@summersault.com Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Matthew Wakeling

On Thu, 10 Dec 2009, Mark Stosberg wrote:

What I'm noticing is that the while the FreeBSD server has 4 Gigs of
memory, there are rarely every more than 2 in use-- the memory use
graphs as being rather constant. My goal is to make good use of those 2
Gigs of memory to improve performance and reduce the CPU usage.


I think you'll find that the RAM is already being used quite effectively 
as disc cache by the OS. It sounds like the server is actually set up 
pretty well. You may get slightly better performance by tweaking a thing 
here or there, but the server needs some OS disc cache to perform well.



(We currently run 8.2, but are planning an upgrade to 8.4 soon).


Highly recommended.


[I tried to post this yesterday but didn't see it come through. This
message is a second attempt.)


The mailing list server will silently chuck any message whose subject 
starts with the word help, just in case you're asking for help about 
managing the mailing list. The default behaviour is not to inform you that 
it has done so. It is highly annoying - could a list admin please consider 
changing this?


Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.-- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] big select is resulting in a large amount of disk writing by kjournald

2009-12-10 Thread Kevin Grittner
Joseph S j...@selectacast.net wrote:
 I just installed a shiny new database server with pg 8.4.1 running
 on CentOS 5.4. After using slony to replicate over my database I
 decided to do some basic performance tests to see how spiffy my
 shiny new server is.  This machine has 32G ram, over 31 of which
 is used for the system file cache.
 
 So I run select count(*) from large_table and I see in xosview a
 solid block of write activity. Runtime is 28125.644 ms for the
 first run.  The second run does not show a block of write activity
 and takes 3327.441 ms
 
As others have mentioned, this is due to hint bit updates, and doing
an explicit VACUUM after the load and before you start using the
database will avoid run-time issues.  You also need statistics, so
be sure to do VACUUM ANALYZE.
 
There is one other sneaky surprise awaiting you, however.  Since
this stuff was all loaded with a narrow range of transaction IDs,
they will all need to be frozen at about the same time; so somewhere
down the road, either during a routine database vacuum or possibly
in the middle of normal operations, all of these rows will need to
be rewritten *again* to change the transaction IDs used for managing
MVCC to the special frozen value.  We routinely follow a load with
VACUUM FREEZE ANALYZE of the database to combine the update to
freeze the tuples with the update to set the hint bits and avoid
this problem.
 
There has been some talk about possibly writing tuples in a frozen
state with the hint bits already set if they are loaded in the same
database transaction which creates the table, but I'm not aware of
anyone currently working on this.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help me put 2 Gigs of RAM to use

2009-12-10 Thread Mark Stosberg

Thanks for the response, Matthew.

 On Thu, 10 Dec 2009, Mark Stosberg wrote:
  What I'm noticing is that the while the FreeBSD server has 4 Gigs of
  memory, there are rarely every more than 2 in use-- the memory use
  graphs as being rather constant. My goal is to make good use of those 2
  Gigs of memory to improve performance and reduce the CPU usage.
 
 I think you'll find that the RAM is already being used quite effectively 
 as disc cache by the OS. It sounds like the server is actually set up 
 pretty well. You may get slightly better performance by tweaking a thing 
 here or there, but the server needs some OS disc cache to perform well.

As part of reviewing this status, I it appears that the OS is only
addresses 3 of the 4 Gigs of memory. We'll work on our FreeBSD setup to
cure that.

Here's how top reports the memory breakdown:

Mem: 513M Active, 2246M Inact, 249M Wired, 163M Cache, 112M Buf, 7176K
Free Swap: 9216M Total, 1052K Used, 9215M Free

So perhaps the OS disc cache is represented in the Inactive memory
statistic? I suppose once we have the 4th Gig of memory actually
available, that would all be doing to the disk cache. 

  (We currently run 8.2, but are planning an upgrade to 8.4 soon).
 
 Highly recommended.

For performance improvements in particular?

Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   m...@summersault.com Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Greg Smith

Mark Stosberg wrote:
I find the file a bit hard to read because of the lack of units in 
the examples, but perhaps that's already been addressed in future

versions.

 max_connections= 400 # Seems to be enough us
 shared_buffers = 8192
 effective_cache_size   = 1000
 work_mem   = 4096
 maintenance_work_mem   = 160MB
  
It's already addressed in 8.2, as you can note by the fact that 
maintenance_work_mem is in there with an easy to read format.  
Guessing that someone either pulled in settings from an older version, 
or used some outdated web guide to get starter settings.


To convert the rest of them, you need to know what the units for each 
parameter is.  You can find that out like this:


gsmith=# select name,setting,unit from pg_settings where name in 
('shared_buffers','effective_cache_size','work_mem');


name | setting | unit
--+-+--
effective_cache_size | 16384   | 8kB
shared_buffers   | 4096| 8kB
work_mem | 1024| kB

So your shared buffers setting is 8192 * 8K = 64MB
effective_cache_size is 8MB
work_mem is 4MB.

The first and last of those are reasonable but on the small side, the 
last is...not.  Increasing it won't actually use more memory on your 
server though, it will just change query plans--so you want to be 
careful about increasing it too much in one shot.


The next set of stuff you need to know about general guidelines for 
server sizing is at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


You'd probably want to put shared_buffers at a higher level based on the 
amount of RAM on your server, but I'd suggest you tune the checkpoint 
parameters along with that--just increasing the buffer space along can 
cause problems rather than solve them if you're having checkpoints all 
the time.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Robert Haas
On Thu, Dec 10, 2009 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote:
 So your shared buffers setting is 8192 * 8K = 64MB
 effective_cache_size is 8MB
 work_mem is 4MB.

 The first and last of those are reasonable but on the small side, the last
 is...not.

I believe that the second instance of the word last in that sentence
should have been middle, referring to effective_cache_size.  Small
values discourage the planner from using indices in certain
situations.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Load experimentation

2009-12-10 Thread Ben Brehmer

Hi Andy,

Load is chugging along. We've optimized our postgres conf as much as 
possible but are seeing the inevitable I/O bottleneck. I had the same 
thought as you (converting inserts into copy's) a while back but 
unfortunately each file has many inserts into many different tables. 
Potentially I could rip through this with a little MapReduce job on 
50-100 nodes, which is still something I might do.


One thought we are playing with was taking advantage of 4 x 414GB EBS 
devices in a RAID0 configuration. This would spread disk writes across 4 
block devices.


Right now I'm wrapping about 1500 inserts in a transaction block. Since 
its an I/O bottlenecks, COPY statements might not give me much advantage.


Its definitely a work in progress :)

Ben


On 09/12/2009 5:31 AM, Andy Colson wrote:

On 12/07/2009 12:12 PM, Ben Brehmer wrote:

Hello All,

I'm in the process of loading a massive amount of data (500 GB). After
some initial timings, I'm looking at 260 hours to load the entire 500GB.
10 days seems like an awfully long time so I'm searching for ways to
speed this up. The load is happening in the Amazon cloud (EC2), on a
m1.large instance:
-7.5 GB memory
-4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
-64-bit platform


So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
modifications I have made are as follows:

shared_buffers = 786432
work_mem = 10240
maintenance_work_mem = 6291456
max_fsm_pages = 300
wal_buffers = 2048
checkpoint_segments = 200
checkpoint_timeout = 300
checkpoint_warning = 30
autovacuum = off


There are a variety of instance types available in the Amazon cloud
(http://aws.amazon.com/ec2/instance-types/), including high memory and
high CPU. High memory instance types come with 34GB or 68GB of memory.
High CPU instance types have a lot less memory (7GB max) but up to 8
virtual cores. I am more than willing to change to any of the other
instance types.

Also, there is nothing else happening on the loading server. It is
completely dedicated to the load.

Any advice would be greatly appreciated.

Thanks,

Ben



I'm kind of curious, how goes the load?  Is it done yet?  Still 
looking at days'n'days to finish?


I was thinking... If the .sql files are really nicely formatted, it 
would not be too hard to whip up a perl script to run as a filter to 
change the statements into copy's.


Each file would have to only fill one table, and only contain inserts, 
and all the insert statements would have to set the same fields.  (And 
I'm sure there could be other problems).


Also, just for the load, did you disable fsync?

-Andy



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 8.4.1 ubuntu karmic slow createdb

2009-12-10 Thread Michael Clemmons
Hey,
I've got a computer which runs but 8.3 and 8.4.  To create a db it takes 4s
for 8.3 and 9s for 8.4.  I have many unit tests which create databases all
of the time and now run much slower than 8.3 but it seems to be much longer
as I remember at one point creating databases I considered an instantaneous
thing.  Does any on the list know why this is true and if I can get it back
to normal.
-Michael


Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb

2009-12-10 Thread Andres Freund
On Thursday 10 December 2009 21:41:08 Michael Clemmons wrote:
 Hey,
 I've got a computer which runs but 8.3 and 8.4.  To create a db it takes 4s
 for 8.3 and 9s for 8.4.  I have many unit tests which create databases all
 of the time and now run much slower than 8.3 but it seems to be much longer
 as I remember at one point creating databases I considered an instantaneous
 thing.  Does any on the list know why this is true and if I can get it back
 to normal.
Possibly you had fsync=off at the time?

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb

2009-12-10 Thread Michael Clemmons
Im not sure what that means ppl in my office with slower hd speeds using 8.4
can create a db in 2s vs my 8-12s.  Could using md5 instead of ident do it?

On Thu, Dec 10, 2009 at 4:56 PM, Andres Freund and...@anarazel.de wrote:

 On Thursday 10 December 2009 21:41:08 Michael Clemmons wrote:
  Hey,
  I've got a computer which runs but 8.3 and 8.4.  To create a db it takes
 4s
  for 8.3 and 9s for 8.4.  I have many unit tests which create databases
 all
  of the time and now run much slower than 8.3 but it seems to be much
 longer
  as I remember at one point creating databases I considered an
 instantaneous
  thing.  Does any on the list know why this is true and if I can get it
 back
  to normal.
 Possibly you had fsync=off at the time?

 Andres



Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb

2009-12-10 Thread Andres Freund
Hi,

On Thursday 10 December 2009 23:01:08 Michael Clemmons wrote:
 Im not sure what that means ppl in my office with slower hd speeds using
  8.4 can create a db in 2s vs my 8-12s.
- Possibly their config is different - they could have disabled the fsync 
parameter which turns the database to be not crashsafe anymore but much faster 
in some circumstances.

- Possibly you have much data in your template1 database?
You could check whether

CREATE DATABASE speedtest TEMPLATE template1; takes more time than
CREATE DATABASE speedtest TEMPLATE template0;.

You should issue both multiple times to ensure caching on the template 
database doesnt play a role.

  Could using md5 instead of ident do it?
Seems unlikely.
Is starting psql near-instantaneus? Are you using createdb or are you 
issuing CREATE DATABASE ...?

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Load experimentation

2009-12-10 Thread Scott Carey
On 12/7/09 11:12 AM, Ben Brehmer benbreh...@gmail.com wrote:

 Thanks for the quick responses. I will respond to all questions in one email:
 
 COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data
 was generated (Hadoop/MapReduce).

If you have control over the MapReduce output, you can have that output
result files in a format that COPY likes.

If you don't have any control over that its more complicated.  I use a final
pass Hadoop Map only job to go over the output and insert into postgres
directly from the job, using the :

INSERT INTO table VALUES (val1, val2, ... ), (val1, val2, ...), ...
Insert style from Java with about 80 rows per insert statement and a single
transaction for about a thousand of these.  This was faster than batch
inserts .


 
 On 07/12/2009 10:39 AM, Thom Brown wrote:
  
 2009/12/7 Kevin Grittner kevin.gritt...@wicourts.gov
  
  
 Ben Brehmer benbreh...@gmail.com wrote:
  
 -7.5 GB memory
 -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
each)
 -64-bit platform
  
  
 What OS?
  
 (PostgreSQL 8.1.3)
  
 Why use such an antiquated, buggy version?  Newer versions are
 faster.
  
 -Kevin
  
  
  
  
 
  
  
 I'd agree with trying to use the latest version you can.
  
 
  
  
 How are you loading this data?  I'd make sure you haven't got any indices,
 primary keys, triggers or constraints on your tables before you begin the
 initial load, just add them after.  Also use either the COPY command for
 loading, or prepared transactions.  Individual insert commands will just take
 way too long.
  
 
  
  
 Regards
  
 
  
  
 Thom
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4.1 ubuntu karmic slow createdb

2009-12-10 Thread Nikolas Everett
In my limited experience ext4 as presented by Karmic is not db friendly.  I
had to carve my swap partition into a swap partition and an xfs partition to
get better db performance.  Try fsync=off first, but if that doesn't work
then try a mini xfs.


On Thu, Dec 10, 2009 at 5:09 PM, Andres Freund and...@anarazel.de wrote:

 Hi,

 On Thursday 10 December 2009 23:01:08 Michael Clemmons wrote:
  Im not sure what that means ppl in my office with slower hd speeds using
   8.4 can create a db in 2s vs my 8-12s.
 - Possibly their config is different - they could have disabled the fsync
 parameter which turns the database to be not crashsafe anymore but much
 faster
 in some circumstances.

 - Possibly you have much data in your template1 database?
 You could check whether

 CREATE DATABASE speedtest TEMPLATE template1; takes more time than
 CREATE DATABASE speedtest TEMPLATE template0;.

 You should issue both multiple times to ensure caching on the template
 database doesnt play a role.

   Could using md5 instead of ident do it?
 Seems unlikely.
 Is starting psql near-instantaneus? Are you using createdb or are you
 issuing CREATE DATABASE ...?

 Andres

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Load experimentation

2009-12-10 Thread Scott Carey



On 12/10/09 3:29 PM, Scott Carey sc...@richrelevance.com wrote:

 On 12/7/09 11:12 AM, Ben Brehmer benbreh...@gmail.com wrote:
 
 Thanks for the quick responses. I will respond to all questions in one email:
 
 COPY command: Unfortunately I'm stuck with INSERTS due to the nature this
 data
 was generated (Hadoop/MapReduce).
 
 If you have control over the MapReduce output, you can have that output
 result files in a format that COPY likes.
 
 If you don't have any control over that its more complicated.  I use a final
 pass Hadoop Map only job to go over the output and insert into postgres
 directly from the job, using the :
 
 INSERT INTO table VALUES (val1, val2, ... ), (val1, val2, ...), ...
 Insert style from Java with about 80 rows per insert statement and a single
 transaction for about a thousand of these.  This was faster than batch
 inserts .
 

I should mention that the above is a bit off.  There is an important caveat
that each of these individual tasks might run twice in Hadoop (only one will
finish -- speculative execution and retry on error).  To deal with this you
can run each job inside a single transaction so that a failure will
rollback, and likely want to turn off speculative execution.

Another option is to run only one map job, with no reduce for this sort of
work in order to ensure duplicate data is not inserted.  We are inserting
into a temp table named uniquely per chunk first (sometimes in parallel).
Then while holding a posstgres advisory lock we do a SELECT * FROM temp
INTO destination type operation, which is fast.

 
 
 On 07/12/2009 10:39 AM, Thom Brown wrote:
 
 2009/12/7 Kevin Grittner kevin.gritt...@wicourts.gov
 
 
 Ben Brehmer benbreh...@gmail.com wrote:
 
 -7.5 GB memory
 -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
each)
 -64-bit platform
 
 
 What OS?
 
 (PostgreSQL 8.1.3)
 
 Why use such an antiquated, buggy version?  Newer versions are
 faster.
 
 -Kevin
 
 
 
 
 
 
 
 I'd agree with trying to use the latest version you can.
 
 
 
 
 How are you loading this data?  I'd make sure you haven't got any indices,
 primary keys, triggers or constraints on your tables before you begin the
 initial load, just add them after.  Also use either the COPY command for
 loading, or prepared transactions.  Individual insert commands will just
 take
 way too long.
 
 
 
 
 Regards
 
 
 
 
 Thom
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance