Re: [PERFORM] why does swap not recover?

2010-03-26 Thread Craig James

On 3/26/10 4:57 PM, Richard Yen wrote:

Hi everyone,

We've recently encountered some swapping issues on our CentOS 64GB Nehalem 
machine, running postgres 8.4.2.  Unfortunately, I was foolish enough to set 
shared_buffers to 40GB.  I was wondering if anyone would have any insight into 
why the swapping suddenly starts, but never recovers?

http://richyen.com/i/swap.png";>

Note, the machine has been up and running since mid-December 2009.  It was only 
a March 8 that this swapping began, and it's never recovered.

If we look at dstat, we find the following:

http://richyen.com/i/dstat.png";>

Note that it is constantly paging in, but never paging out.


This happens when you have too many processes using too much space to fit in 
real memory, but none of them are changing their memory image.  If the system 
swaps a process in, but that process doesn't change anything in memory, then 
there are no dirty pages and the kernel can just kick the process out of memory 
without writing anything back to the swap disk -- the data in the swap are 
still valid.

It's a classic problem when processes are running round-robin. Say you have 
space for 100 processes, but you're running 101 process.  When you get to the 
#101, #1 is the oldest so it swaps out.  Then #1 runs, and #2 is the oldest, so 
it gets kicked out.  Then #2 runs and kicks out #3 ... and so forth.  Going 
from 100 to 101 process brings the system nearly to a halt.

Some operating systems try to use tricks to keep this from happening, but it's 
a hard problem to solve.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] why does swap not recover?

2010-03-26 Thread Scott Carey
On Mar 26, 2010, at 4:57 PM, Richard Yen wrote:

> Hi everyone,
> 
> We've recently encountered some swapping issues on our CentOS 64GB Nehalem 
> machine, running postgres 8.4.2.  Unfortunately, I was foolish enough to set 
> shared_buffers to 40GB.  I was wondering if anyone would have any insight 
> into why the swapping suddenly starts, but never recovers?
> 
> http://richyen.com/i/swap.png";>
> 
> Note, the machine has been up and running since mid-December 2009.  It was 
> only a March 8 that this swapping began, and it's never recovered.
> 
> If we look at dstat, we find the following:
> 
> http://richyen.com/i/dstat.png";>
> 
> Note that it is constantly paging in, but never paging out.  This would 
> indicate that it's constantly reading from swap, but never writing out to it. 
>  Why would postgres do this? (postgres is pretty much the only thing running 
> on this machine).
> 
> I'm planning on lowering the shared_buffers to a more sane value, like 25GB 
> (pgtune recommends this for a Mixed-purpose machine) or less (pgtune 
> recommends 14GB for an OLTP machine).  However, before I do this (and 
> possibly resolve the issue), I was hoping to see if anyone would have an 
> explanation for the constant reading from swap, but never writing back.

Linux until recently does not account for shared memory properly in its swap 
'aggressiveness' decisions.
Setting shared_buffers larger than 35% is asking for trouble.

You could try adjusting the 'swappiness' setting on the fly and seeing how it 
reacts, but one consequence of that is trading off disk swapping for kswapd 
using up tons of CPU causing other trouble.

Either use one of the last few kernel versions (I forget which addressed the 
memory accounting issues, and haven't tried it myself), or turn shared_buffers 
down.  I recommend trying 10GB or so to start.

> 
> --Richard
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-26 Thread Scott Carey

On Mar 22, 2010, at 4:46 PM, Craig James wrote:

> On 3/22/10 11:47 AM, Scott Carey wrote:
>> 
>> On Mar 17, 2010, at 9:41 AM, Craig James wrote:
>> 
>>> On 3/17/10 2:52 AM, Greg Stark wrote:
 On Wed, Mar 17, 2010 at 7:32 AM, Pierre C   wrote:
>> I was thinking in something like that, except that the factor I'd use
>> would be something like 50% or 100% of current size, capped at (say) 1 
>> GB.
 
 This turns out to be a bad idea. One of the first thing Oracle DBAs
 are told to do is change this default setting to allocate some
 reasonably large fixed size rather than scaling upwards.
 
 This might be mostly due to Oracle's extent-based space management but
 I'm not so sure. Recall that the filesystem is probably doing some
 rounding itself. If you allocate 120kB it's probably allocating 128kB
 itself anyways. Having two layers rounding up will result in odd
 behaviour.
 
 In any case I was planning on doing this a while back. Then I ran some
 experiments and couldn't actually demonstrate any problem. ext2 seems
 to do a perfectly reasonable job of avoiding this problem. All the
 files were mostly large contiguous blocks after running some tests --
 IIRC running pgbench.
>>> 
>>> This is one of the more-or-less solved problems in Unix/Linux.  Ext* file 
>>> systems have a "reserve" usually of 10% of the disk space that nobody 
>>> except root can use.  It's not for root, it's because with 10% of the disk 
>>> free, you can almost always do a decent job of allocating contiguous blocks 
>>> and get good performance.  Unless Postgres has some weird problem that 
>>> Linux has never seen before (and that wouldn't be unprecedented...), 
>>> there's probably no need to fool with file-allocation strategies.
>>> 
>>> Craig
>>> 
>> 
>> Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
>> tables being written to at once.  Result?  Fragmented tables.
> 
> Is this from real-life experience?  With fragmentation, there's a point of 
> diminishing return.  A couple head-seeks now and then hardly matter.  My 
> recollection is that even when there are lots of concurrent processes running 
> that are all making files larger and larger, the Linux file system still can 
> do a pretty good job of allocating mostly-contiguous space.  It doesn't just 
> dumbly allocate from some list, but rather tries to allocate in a way that 
> results in pretty good "contiguousness" (if that's a word).
> 
> On the other hand, this is just from reading discussion groups like this one 
> over the last few decades, I haven't tried it...
> 

Well how fragmented is too fragmented depends on the use case and the hardware 
capability.  In real world use, which for me means about 20 phases of large 
bulk inserts a day and not a lot of updates or index maintenance, the system 
gets somewhat fragmented but its not too bad.  I did a dump/restore in 8.4 with 
parallel restore and it was much slower than usual.  I did a single threaded 
restore and it was much faster.  The dev environments are on ext3 and we see 
this pretty clearly -- but poor OS tuning can mask it (readahead parameter not 
set high enough).   This is CentOS 5.4/5.3, perhaps later kernels are better at 
scheduling file writes to avoid this.  We also use the deadline scheduler which 
helps a lot on concurrent reads, but might be messing up concurrent writes.
On production with xfs this was also bad at first --- in fact worse because 
xfs's default 'allocsize' setting is 64k.  So files were regularly fragmented 
in small multiples of 64k.   Changing the 'allocsize' parameter to 80MB made 
the restore process produce files with fragment sizes of 80MB.  80MB is big for 
most systems, but this array does over 1000MB/sec sequential read at peak, and 
only 200MB/sec with moderate fragmentation.
It won't fail to allocate disk space due to any 'reservations' of the delayed 
allocation, it just means that it won't choose to create a new file or extent 
within 80MB of another file that is open unless it has to.  This can cause 
performance problems if you have lots of small files, which is why the default 
is 64k.



> Craig



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump far too slow

2010-03-26 Thread Scott Carey
On Mar 21, 2010, at 8:50 AM, David Newall wrote:

> Tom Lane wrote:
>> I would bet that the reason for the slow throughput is that gzip
>> is fruitlessly searching for compressible sequences.  It won't find many.
>> 
> 
> 
> Indeed, I didn't expect much reduction in size, but I also didn't expect 
> a four-order of magnitude increase in run-time (i.e. output at 
> 10MB/second going down to 500KB/second), particularly as my estimate was 
> based on gzipping a previously gzipped file.  I think it's probably 
> pathological data, as it were.  Might even be of interest to gzip's 
> maintainers.
> 

gzip -9 is known to be very very inefficient.  It hardly ever is more compact 
than -7, and often 2x slower or worse.
Its almost never worth it to use unless you don't care how long the compression 
time is.

Try -Z1

at level 1 compression the output will often be good enough compression at 
rather fast speeds.  It is about 6x as fast as gzip -9 and typically creates 
result files 10% larger.

For some compression/decompression speed benchmarks see:

http://tukaani.org/lzma/benchmarks.html



> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] why does swap not recover?

2010-03-26 Thread Scott Marlowe
On Fri, Mar 26, 2010 at 5:57 PM, Richard Yen  wrote:
> Hi everyone,
>
> We've recently encountered some swapping issues on our CentOS 64GB Nehalem

What version Centos?  How up to date is it?  Are there any other
settings that aren't defaults in things like /etc/sysctl.conf?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] why does swap not recover?

2010-03-26 Thread Richard Yen
Hi everyone,

We've recently encountered some swapping issues on our CentOS 64GB Nehalem 
machine, running postgres 8.4.2.  Unfortunately, I was foolish enough to set 
shared_buffers to 40GB.  I was wondering if anyone would have any insight into 
why the swapping suddenly starts, but never recovers?

http://richyen.com/i/swap.png";>

Note, the machine has been up and running since mid-December 2009.  It was only 
a March 8 that this swapping began, and it's never recovered.

If we look at dstat, we find the following:

http://richyen.com/i/dstat.png";>

Note that it is constantly paging in, but never paging out.  This would 
indicate that it's constantly reading from swap, but never writing out to it.  
Why would postgres do this? (postgres is pretty much the only thing running on 
this machine).

I'm planning on lowering the shared_buffers to a more sane value, like 25GB 
(pgtune recommends this for a Mixed-purpose machine) or less (pgtune recommends 
14GB for an OLTP machine).  However, before I do this (and possibly resolve the 
issue), I was hoping to see if anyone would have an explanation for the 
constant reading from swap, but never writing back.

--Richard
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why Wal_buffer is 64KB

2010-03-26 Thread Scott Marlowe
On Fri, Mar 26, 2010 at 7:43 AM, Pierre C  wrote:
>
>> After fsync/syncronous_commit off
>
> Do not use fsync off, it is not safe. Who cares about the performance of
> fsync=off, when in practice you'd never use it with real data.
> synchronnous_commit=off is fine for some applications, though.

There are situations where it's ok, when all the data are
reproduceable from other sources, etc.  for instance I have a
reporting server that is a slony slave that runs with fsync off.  If
it does crash and I can recreate the node in an hour or so and be back
online.  With fsync off the machine is too slow to do its job, and
it's not the primary repo of the real data, so it's ok there.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why Wal_buffer is 64KB

2010-03-26 Thread Pierre C



After fsync/syncronous_commit off


Do not use fsync off, it is not safe. Who cares about the performance of  
fsync=off, when in practice you'd never use it with real data.

synchronnous_commit=off is fine for some applications, though.

More info is needed about your configuration (hardware, drives, memory,  
etc).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure

2010-03-26 Thread Richard Huxton

On 26/03/10 03:56, Eliot Gable wrote:


I really am chasing milliseconds here, and I appreciate all your feedback.
You've given me a relatively large number of possible optimizations I can
try out. I will definitely try out the libpqtypes. That sounds like a
promising way to further cut down on execution time. I think most of my
performance penalty is in transfering the results back to the C++
application.


In addition to all of Merlin's good advice, if the client is on a 
different machine to the server then try sticking wireshark or similar 
onto the connection. That should make it pretty clear where the main 
costs are in getting your data back.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure

2010-03-26 Thread Merlin Moncure
On Thu, Mar 25, 2010 at 11:56 PM, Eliot Gable
 wrote:
>>
>> How many rows min/max/avg are coming back in your refcursors?  Are you
>> using cursors in order to return multiple complex data structures
>> (sets, etc) in a single function call?
>>
>
> I think the largest number of rows is around 40. Most are substantially
> smaller. However, most of them have about two dozen or more columns, and I
> have already shortened the list of columns to the minimum possible. The
> average number of rows is around 10, but the largest sets of rows also have
> the most columns. I'm using the cursors in order to obtain multiple complex
> data structures in a single function call.

ok, small sets.  yes, passing them back to the client as arrays is
probably going to be faster.  It's a trivial change to your proc.  you
have to define a type for your array element the way we are going to
use it.  you can use a composite type or a table (I prefer a table).

create table mystuff_t
(
  a text,
  b int,
  c timestamptz
);

create function myproc([...], mystuffs out mystuff_t[])
[inside proc]

replace your cursor declaration with this:

select array
(
   select (a,b,c)::mystuff_t from [...]
) into mystuffs;

code an alternate version of the function and then inside libpq
execute the query in binary and discard the results, timing the
results and comparing to how you run your query now also discarding
the results.  we want to time it this way because from timing it from
psql includes the time to print out the array in text format which we
can avoid with libpqtypes (which we are not going to mess with, until
we know there is a resaon to go in this direction).  We do need to
include the time to turn around and fetch the data from the
refcursors. If you see at least a 10-20% improvement, it warrants
further effort IMO (and say goodbye to refcursors forever).

>> WITH clauses can make your queries much easier to read and yield great
>> speedups if you need to access the table expression multiple times
>> from other parts of the query.  however, in some cases you can get
>> into trouble because a standard set of joins is going to give the
>> planner the most flexibility in terms of query optimization.
>>
>
> So far, every case I have converted to WITH clauses has resulted in more
> than double the speed (half the time required to perform the query). The
> main reason appears to be from avoiding calculating JOIN conditions multiple
> times in different parts of the query due to the UNION and EXCEPT clauses.

I have a hard time believing that unless there are other factors
compromising the planner like bad statistics or a non optimal query or
you are dealing with a relatively special case.

'EXCEPT' btw is also an optimization target. maybe think about
converting to 'letf join where rightcol is null' or something like
that.  not 100% sure, I think some work was done recently on except so
this advice may not be as true as it used to be, and possibly moot if
the number of rows being considered by except is very small.

> So, you are saying that I can return a complex type as a result which
> contains arrays of other complex types and just use my single SELECT command
> to retrieve the whole data set? That would be much simpler and I imagine
> must faster.

yes, however you will want to receive as few complex types as
possible, meaning your result set should still have multiple columns.
reducing the number of columns is not an optimization target.  in
other words, do the minimal amount of stacking necessary to allow
single query extraction of data.

> I really am chasing milliseconds here, and I appreciate all your feedback.
> You've given me a relatively large number of possible optimizations I can
> try out. I will definitely try out the libpqtypes. That sounds like a
> promising way to further cut down on execution time. I think most of my
> performance penalty is in transfering the results back to the C++
> application.

yes.  I've suggested libpqtypes to a number of people on the lists,
and you are what i'd consider the ideal candidate. libpqtypes will
completely transform the way you think about postgresql and libpq.
good luck. if you need help setting it up you can email me privately
or on the libpqtypes list.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance