Re: [PERFORM] why does swap not recover?
On 3/26/10 4:57 PM, Richard Yen wrote: Hi everyone, We've recently encountered some swapping issues on our CentOS 64GB Nehalem machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set shared_buffers to 40GB. I was wondering if anyone would have any insight into why the swapping suddenly starts, but never recovers? http://richyen.com/i/swap.png";> Note, the machine has been up and running since mid-December 2009. It was only a March 8 that this swapping began, and it's never recovered. If we look at dstat, we find the following: http://richyen.com/i/dstat.png";> Note that it is constantly paging in, but never paging out. This happens when you have too many processes using too much space to fit in real memory, but none of them are changing their memory image. If the system swaps a process in, but that process doesn't change anything in memory, then there are no dirty pages and the kernel can just kick the process out of memory without writing anything back to the swap disk -- the data in the swap are still valid. It's a classic problem when processes are running round-robin. Say you have space for 100 processes, but you're running 101 process. When you get to the #101, #1 is the oldest so it swaps out. Then #1 runs, and #2 is the oldest, so it gets kicked out. Then #2 runs and kicks out #3 ... and so forth. Going from 100 to 101 process brings the system nearly to a halt. Some operating systems try to use tricks to keep this from happening, but it's a hard problem to solve. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why does swap not recover?
On Mar 26, 2010, at 4:57 PM, Richard Yen wrote: > Hi everyone, > > We've recently encountered some swapping issues on our CentOS 64GB Nehalem > machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set > shared_buffers to 40GB. I was wondering if anyone would have any insight > into why the swapping suddenly starts, but never recovers? > > http://richyen.com/i/swap.png";> > > Note, the machine has been up and running since mid-December 2009. It was > only a March 8 that this swapping began, and it's never recovered. > > If we look at dstat, we find the following: > > http://richyen.com/i/dstat.png";> > > Note that it is constantly paging in, but never paging out. This would > indicate that it's constantly reading from swap, but never writing out to it. > Why would postgres do this? (postgres is pretty much the only thing running > on this machine). > > I'm planning on lowering the shared_buffers to a more sane value, like 25GB > (pgtune recommends this for a Mixed-purpose machine) or less (pgtune > recommends 14GB for an OLTP machine). However, before I do this (and > possibly resolve the issue), I was hoping to see if anyone would have an > explanation for the constant reading from swap, but never writing back. Linux until recently does not account for shared memory properly in its swap 'aggressiveness' decisions. Setting shared_buffers larger than 35% is asking for trouble. You could try adjusting the 'swappiness' setting on the fly and seeing how it reacts, but one consequence of that is trading off disk swapping for kswapd using up tons of CPU causing other trouble. Either use one of the last few kernel versions (I forget which addressed the memory accounting issues, and haven't tried it myself), or turn shared_buffers down. I recommend trying 10GB or so to start. > > --Richard > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Block at a time ...
On Mar 22, 2010, at 4:46 PM, Craig James wrote: > On 3/22/10 11:47 AM, Scott Carey wrote: >> >> On Mar 17, 2010, at 9:41 AM, Craig James wrote: >> >>> On 3/17/10 2:52 AM, Greg Stark wrote: On Wed, Mar 17, 2010 at 7:32 AM, Pierre C wrote: >> I was thinking in something like that, except that the factor I'd use >> would be something like 50% or 100% of current size, capped at (say) 1 >> GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. >>> >>> This is one of the more-or-less solved problems in Unix/Linux. Ext* file >>> systems have a "reserve" usually of 10% of the disk space that nobody >>> except root can use. It's not for root, it's because with 10% of the disk >>> free, you can almost always do a decent job of allocating contiguous blocks >>> and get good performance. Unless Postgres has some weird problem that >>> Linux has never seen before (and that wouldn't be unprecedented...), >>> there's probably no need to fool with file-allocation strategies. >>> >>> Craig >>> >> >> Its fairly easy to break. Just do a parallel import with say, 16 concurrent >> tables being written to at once. Result? Fragmented tables. > > Is this from real-life experience? With fragmentation, there's a point of > diminishing return. A couple head-seeks now and then hardly matter. My > recollection is that even when there are lots of concurrent processes running > that are all making files larger and larger, the Linux file system still can > do a pretty good job of allocating mostly-contiguous space. It doesn't just > dumbly allocate from some list, but rather tries to allocate in a way that > results in pretty good "contiguousness" (if that's a word). > > On the other hand, this is just from reading discussion groups like this one > over the last few decades, I haven't tried it... > Well how fragmented is too fragmented depends on the use case and the hardware capability. In real world use, which for me means about 20 phases of large bulk inserts a day and not a lot of updates or index maintenance, the system gets somewhat fragmented but its not too bad. I did a dump/restore in 8.4 with parallel restore and it was much slower than usual. I did a single threaded restore and it was much faster. The dev environments are on ext3 and we see this pretty clearly -- but poor OS tuning can mask it (readahead parameter not set high enough). This is CentOS 5.4/5.3, perhaps later kernels are better at scheduling file writes to avoid this. We also use the deadline scheduler which helps a lot on concurrent reads, but might be messing up concurrent writes. On production with xfs this was also bad at first --- in fact worse because xfs's default 'allocsize' setting is 64k. So files were regularly fragmented in small multiples of 64k. Changing the 'allocsize' parameter to 80MB made the restore process produce files with fragment sizes of 80MB. 80MB is big for most systems, but this array does over 1000MB/sec sequential read at peak, and only 200MB/sec with moderate fragmentation. It won't fail to allocate disk space due to any 'reservations' of the delayed allocation, it just means that it won't choose to create a new file or extent within 80MB of another file that is open unless it has to. This can cause performance problems if you have lots of small files, which is why the default is 64k. > Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump far too slow
On Mar 21, 2010, at 8:50 AM, David Newall wrote: > Tom Lane wrote: >> I would bet that the reason for the slow throughput is that gzip >> is fruitlessly searching for compressible sequences. It won't find many. >> > > > Indeed, I didn't expect much reduction in size, but I also didn't expect > a four-order of magnitude increase in run-time (i.e. output at > 10MB/second going down to 500KB/second), particularly as my estimate was > based on gzipping a previously gzipped file. I think it's probably > pathological data, as it were. Might even be of interest to gzip's > maintainers. > gzip -9 is known to be very very inefficient. It hardly ever is more compact than -7, and often 2x slower or worse. Its almost never worth it to use unless you don't care how long the compression time is. Try -Z1 at level 1 compression the output will often be good enough compression at rather fast speeds. It is about 6x as fast as gzip -9 and typically creates result files 10% larger. For some compression/decompression speed benchmarks see: http://tukaani.org/lzma/benchmarks.html > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why does swap not recover?
On Fri, Mar 26, 2010 at 5:57 PM, Richard Yen wrote: > Hi everyone, > > We've recently encountered some swapping issues on our CentOS 64GB Nehalem What version Centos? How up to date is it? Are there any other settings that aren't defaults in things like /etc/sysctl.conf? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] why does swap not recover?
Hi everyone, We've recently encountered some swapping issues on our CentOS 64GB Nehalem machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set shared_buffers to 40GB. I was wondering if anyone would have any insight into why the swapping suddenly starts, but never recovers? http://richyen.com/i/swap.png";> Note, the machine has been up and running since mid-December 2009. It was only a March 8 that this swapping began, and it's never recovered. If we look at dstat, we find the following: http://richyen.com/i/dstat.png";> Note that it is constantly paging in, but never paging out. This would indicate that it's constantly reading from swap, but never writing out to it. Why would postgres do this? (postgres is pretty much the only thing running on this machine). I'm planning on lowering the shared_buffers to a more sane value, like 25GB (pgtune recommends this for a Mixed-purpose machine) or less (pgtune recommends 14GB for an OLTP machine). However, before I do this (and possibly resolve the issue), I was hoping to see if anyone would have an explanation for the constant reading from swap, but never writing back. --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
On Fri, Mar 26, 2010 at 7:43 AM, Pierre C wrote: > >> After fsync/syncronous_commit off > > Do not use fsync off, it is not safe. Who cares about the performance of > fsync=off, when in practice you'd never use it with real data. > synchronnous_commit=off is fine for some applications, though. There are situations where it's ok, when all the data are reproduceable from other sources, etc. for instance I have a reporting server that is a slony slave that runs with fsync off. If it does crash and I can recreate the node in an hour or so and be back online. With fsync off the machine is too slow to do its job, and it's not the primary repo of the real data, so it's ok there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
After fsync/syncronous_commit off Do not use fsync off, it is not safe. Who cares about the performance of fsync=off, when in practice you'd never use it with real data. synchronnous_commit=off is fine for some applications, though. More info is needed about your configuration (hardware, drives, memory, etc). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure
On 26/03/10 03:56, Eliot Gable wrote: I really am chasing milliseconds here, and I appreciate all your feedback. You've given me a relatively large number of possible optimizations I can try out. I will definitely try out the libpqtypes. That sounds like a promising way to further cut down on execution time. I think most of my performance penalty is in transfering the results back to the C++ application. In addition to all of Merlin's good advice, if the client is on a different machine to the server then try sticking wireshark or similar onto the connection. That should make it pretty clear where the main costs are in getting your data back. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure
On Thu, Mar 25, 2010 at 11:56 PM, Eliot Gable wrote: >> >> How many rows min/max/avg are coming back in your refcursors? Are you >> using cursors in order to return multiple complex data structures >> (sets, etc) in a single function call? >> > > I think the largest number of rows is around 40. Most are substantially > smaller. However, most of them have about two dozen or more columns, and I > have already shortened the list of columns to the minimum possible. The > average number of rows is around 10, but the largest sets of rows also have > the most columns. I'm using the cursors in order to obtain multiple complex > data structures in a single function call. ok, small sets. yes, passing them back to the client as arrays is probably going to be faster. It's a trivial change to your proc. you have to define a type for your array element the way we are going to use it. you can use a composite type or a table (I prefer a table). create table mystuff_t ( a text, b int, c timestamptz ); create function myproc([...], mystuffs out mystuff_t[]) [inside proc] replace your cursor declaration with this: select array ( select (a,b,c)::mystuff_t from [...] ) into mystuffs; code an alternate version of the function and then inside libpq execute the query in binary and discard the results, timing the results and comparing to how you run your query now also discarding the results. we want to time it this way because from timing it from psql includes the time to print out the array in text format which we can avoid with libpqtypes (which we are not going to mess with, until we know there is a resaon to go in this direction). We do need to include the time to turn around and fetch the data from the refcursors. If you see at least a 10-20% improvement, it warrants further effort IMO (and say goodbye to refcursors forever). >> WITH clauses can make your queries much easier to read and yield great >> speedups if you need to access the table expression multiple times >> from other parts of the query. however, in some cases you can get >> into trouble because a standard set of joins is going to give the >> planner the most flexibility in terms of query optimization. >> > > So far, every case I have converted to WITH clauses has resulted in more > than double the speed (half the time required to perform the query). The > main reason appears to be from avoiding calculating JOIN conditions multiple > times in different parts of the query due to the UNION and EXCEPT clauses. I have a hard time believing that unless there are other factors compromising the planner like bad statistics or a non optimal query or you are dealing with a relatively special case. 'EXCEPT' btw is also an optimization target. maybe think about converting to 'letf join where rightcol is null' or something like that. not 100% sure, I think some work was done recently on except so this advice may not be as true as it used to be, and possibly moot if the number of rows being considered by except is very small. > So, you are saying that I can return a complex type as a result which > contains arrays of other complex types and just use my single SELECT command > to retrieve the whole data set? That would be much simpler and I imagine > must faster. yes, however you will want to receive as few complex types as possible, meaning your result set should still have multiple columns. reducing the number of columns is not an optimization target. in other words, do the minimal amount of stacking necessary to allow single query extraction of data. > I really am chasing milliseconds here, and I appreciate all your feedback. > You've given me a relatively large number of possible optimizations I can > try out. I will definitely try out the libpqtypes. That sounds like a > promising way to further cut down on execution time. I think most of my > performance penalty is in transfering the results back to the C++ > application. yes. I've suggested libpqtypes to a number of people on the lists, and you are what i'd consider the ideal candidate. libpqtypes will completely transform the way you think about postgresql and libpq. good luck. if you need help setting it up you can email me privately or on the libpqtypes list. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance