Re: [HACKERS] Move unused buffers to freelist
On Friday, May 24, 2013 2:47 AM Jim Nasby wrote: > On 5/14/13 2:13 PM, Greg Smith wrote: > > It is possible that we are told to put something in the freelist that > > is already in it; don't screw up the list if so. > > > > I don't see where the code does anything to handle that though. What > was your intention here? > > IIRC, the code that pulls from the freelist already deals with the > possibility that a block was on the freelist but has since been put to > use. You are right, the check exists in StrategyGetBuffer() >If that's the case then there shouldn't be much penalty to adding > a block multiple times (at least within reason...) There is a check in StrategyFreeBuffer() which will not allow to put multiple times, I had just used the same check in new function. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Thursday, May 23, 2013 8:45 PM Robert Haas wrote: > On Tue, May 21, 2013 at 3:06 AM, Amit Kapila > wrote: > >> Here are the results. The first field in each line is the number of > >> clients. The second number is the scale factor. The numbers after > >> "master" and "patched" are the median of three runs. > > >>but overall, on both the read-only and > >> read-write tests, I'm not seeing anything that resembles the big > gains > >> you reported. > > > > I have not generated numbers for read-write tests, I will check that > once. > > For read-only tests, the performance increase is minor and different > from > > what I saw. > > Few points which I could think of for difference in data: > > > > 1. In my test's I always observed best data when number of > clients/threads > > are equal to number of cores which in your case should be at 16. > > Sure, but you also showed substantial performance increases across a > variety of connection counts, whereas I'm seeing basically no change > at any connection count. > > 2. I think for scale factor 100 and 300, there should not be much > > performance increase, as for them they should mostly get buffer from > > freelist inspite of even bgwriter adds to freelist or not. > > I agree. > > > 3. In my tests variance is for shared buffers, database size is > always less > > than RAM (Scale Factor -1200, approx db size 16~17GB, RAM -24 GB), > but due > > to variance in shared buffers, it can lead to I/O. > > Not sure I understand this. What I wanted to say is that all your tests was on same shared buffer configuration 8GB, where as in my tests I was trying to vary shared buffers as well. However this is not important point, as it should show performance gain on configuration you ran, if there is any real benefit of this patch. > > 4. Each run is of 20 minutes, not sure if this has any difference. > > I've found that 5-minute tests are normally adequate to identify > performance changes on the pgbench SELECT-only workload. > > >> Tests were run on a 16-core, 64-hwthread PPC64 machine provided to > the > >> PostgreSQL community courtesy of IBM. Fedora 16, Linux kernel > 3.2.6. > > > > To think about the difference in your and my runs, could you please > tell me > > about below points > > 1. What is RAM in machine. > > 64GB > > > 2. Are number of threads equal to number of clients. > > Yes. > > > 3. Before starting tests I have always done pre-warming of buffers > (used > > pg_prewarm written by you last year), is it same for above read-only > tests. > > No, I did not use pg_prewarm. But I don't think that should matter > very much. First, the data was all in the OS cache. Second, on the > small scale factors, everything should end up in cache pretty quickly > anyway. And on the large scale factors, well, you're going to be > churning shared_buffers anyway, so pg_prewarm is only going to affect > the very beginning of the test. > > > 4. Can you please once again run only the test where you saw > variation(8 > > clients @ scale> factor 1000 on master), because I have also seen > that > > performance difference is very good for certain > >configurations(Scale Factor, RAM, Shared Buffers) > > I can do this if I get a chance, but I don't really see where that's > going to get us. It seems pretty clear to me that there's no benefit > on these tests from this patch. So either one of us is doing the > benchmarking incorrectly, or there's some difference in our test > environments that is significant, but none of the proposals you've > made so far seem to me to explain the difference. Sorry for requesting you to run again without any concrete point. I realized after reading data you posted more carefully that the reading was just some m/c problem or something else, but actually there is no gain. After your post, I had tried with various configurations on different m/c, but till now I am not able see the performance gain as was shown in my initial mail. Infact I had tried on same m/c as well, it some times give good data. I will update you if I get any concrete reason and results. > > Apart from above, I had one more observation during my investigation > to find > > why in some cases, there is a small dip: > > 1. Many times, it finds the buffer in free list is not usable, means > it's > > refcount or usage count is not zero, due to which it had to spend > more time > > under BufFreelistLock. > >I had not any further experiments related to this finding like if > it > > really adds any overhead. > > > > Currently I am trying to find reasons for small dip of performance > and see > > if I could do something to avoid it. Also I will run tests with > various > > configurations. > > > > Any other suggestions? > > Well, I think that the code in SyncOneBuffer is not really optimal. > In some cases you actually lock and unlock the buffer header an extra > time, which seems like a whole lotta extra overhead. In fact, I don't > think you should be modifying S
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions "make install"
Re: Cédric Villemain 2013-05-17 <201305171642.59241.ced...@2ndquadrant.com> > If it seems to be on the right way, I'll keep fixing EXTENSION building with > VPATH. I haven't tried the patch, but let me just say that Debian (and apt.postgresql.org) would very much like the VPATH situation getting improved. At the moment we seem to have to invent a new build recipe for every extension around. Thanks for working on this. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 05/23/2013 10:03 PM, Andres Freund wrote: > On 2013-05-23 19:51:48 +0200, Andres Freund wrote: >> We currently need to make sure we scanned the whole relation and have >> frozen everything to have a sensible relfrozenxid for a relation. >> >> So, what I propose instead is basically: >> 1) only vacuum non-all-visible pages, even when doing it for >>anti-wraparound >> 2) When we can set all-visible guarantee that all tuples on the page are >>fully hinted. During recovery do the same, so we don't need to log >>all hint bits. >>We can do this with only an exclusive lock on the buffer, we don't >>need a cleanup lock. >> 3) When we cannot mark a page all-visible or we cannot get the cleanup >>lock, remember the oldest xmin on that page. We could set all visible >>in the former case, but we want the page to be cleaned up sometime >>soonish. >> 4) If we can get the cleanup lock, purge dead tuples from the page and >>the indexes, just as today. Set the page as all-visible. >> >> That way we know that any page that is all-visible doesn't ever need to >> look at xmin/xmax since we are sure to have set all relevant hint >> bits. > Heikki noticed that I made quite the omission here which is that you > would need to mark tuples as all visible as well. I was thinking about > using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that. We could have a "vacuum_less=true" mode, where instead of marking tuples all visible here you actually freeze them, that is set the xid to frozen. You will get less forensic capability in exchange of less vacuuming. Maybe also add an "early_freeze" hint bit to mark this situation. Or maybe set the tuples frozenxid when un-marking the page as all visible to delay the effects a little ? Hannu > > Greetings, > > Andres Freund > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to .gitignore
On Fri, May 24, 2013 at 12:04 AM, Christopher Browne wrote: > There hasn't been general agreement on the merits of particular .gitignore > rules of this sort. > > You could hide your own favorite patterns by putting this into your > ~/.gitignore that isn't part of the repo, configuring this globally, thus: > git config --global core.excludesfile '~/.gitignore' > > Yes... I know that... > That has the consequence that you can hide whatever things your own tools > like to create, and not worry about others' preferences. > > Us Emacs users can put things like *~, #*#, and such into our own "ignore" > configuration; that doesn't need to bother you, and vice-versa for your > vim-oriented patterns. > I agree with you about vim-oriented patterns, because its a particular tool, but "ctags" and "etags" be part of postgres source tree and its generate some output inside them, so I think we must ignore it. IMHO all output generated by tools inside the source tree that will not be committed must be added to .gitignore Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Patch to .gitignore
There hasn't been general agreement on the merits of particular .gitignore rules of this sort. You could hide your own favorite patterns by putting this into your ~/.gitignore that isn't part of the repo, configuring this globally, thus: git config --global core.excludesfile '~/.gitignore' That has the consequence that you can hide whatever things your own tools like to create, and not worry about others' preferences. Us Emacs users can put things like *~, #*#, and such into our own "ignore" configuration; that doesn't need to bother you, and vice-versa for your vim-oriented patterns.
[HACKERS] Patch to .gitignore
Hi all, The proposed patch add some files to ignore in .gitignore: - tags (produced by src/tools/make_ctags) - TAGS (produced bu src/tools/make_etags) - .*.swp (may appear in source tree if vi/vim was killed by some reason) Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello gitignore_ctags_etags_vi.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On Thu, May 23, 2013 at 1:51 PM, Andres Freund wrote: > So, what I propose instead is basically: > 1) only vacuum non-all-visible pages, even when doing it for >anti-wraparound Check. We might want an option to force a scan of the whole relation. > 2) When we can set all-visible guarantee that all tuples on the page are >fully hinted. During recovery do the same, so we don't need to log >all hint bits. >We can do this with only an exclusive lock on the buffer, we don't >need a cleanup lock. I don't think this works. Emitting XLOG_HEAP_VISIBLE for a heap page does not emit an FPI for the heap page, only (if needed) for the visibility map page. So a subsequent crash that tears the page could keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the hint bits. > 3) When we cannot mark a page all-visible or we cannot get the cleanup >lock, remember the oldest xmin on that page. We could set all visible >in the former case, but we want the page to be cleaned up sometime >soonish. I think you mean "in the latter case" not "in the former case". If not, then I'm confused. > 4) If we can get the cleanup lock, purge dead tuples from the page and >the indexes, just as today. Set the page as all-visible. > > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. > > We don't even necessarily need to log the hint bits for all items since > the redo for all_visible could make sure all items are hinted. The only > problem is knowing up to where we can truncate pg_clog... The redo for all_visible cannot make sure all items are hinted. Again, there's no FPI on the heap page. The heap page could in fact contain dead tuples at the time we mark it all-visible. Consider, for example: 0. Checkpoint. 1. The buffer becomes all visible. 2. A tuple is inserted, making the buffer not-all-visible. 3. The page is written by the OS. 4. Crash. Now, recovery will first find the record marking the buffer all-visible, and will mark it all-visible. Now the all-visible bit on the page is flat-out wrong, but it doesn't matter because we haven't reached consistency. Next we'll find the heap-insert record, which will have an FPI, since it's the first WAL-logged change to the buffer since the last checkpoint. Now the FPI fixes everything and we're back in a sane state. Now in this particular case it wouldn't hurt anything if the redo routine that set the all-visible bit also hinted all the tuples, because the FPI is going to overwrite it anyway. But suppose in lieu of steps (3) and (4) we write half of the page and then crash, leaving behind a torn page. Now it's pretty crazy to think about trying to hint tuples; the page may be in a completely insane state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block write statistics WIP
On 23.05.2013 19:10, Greg Smith wrote: On 5/20/13 7:51 AM, Heikki Linnakangas wrote: The way that MarkDirty requires this specific underlying storage manager behavior to work properly strikes me as as a bit of a layering violation too. I'd like the read and write paths to have a similar API, but here they don't even operate on the same type of inputs. Addressing that is probably harder than just throwing a hack on the existing code though. To be honest, I don't understand what you mean by that. ? Let's say you were designing a storage layer API from scratch for what Postgres does. That might take a relation as its input, like ReadBuffer does. Hiding the details of how that turns into a physical file operation would be a useful goal of such a layer. I'd then consider it a problem if that exposed things like the actual mapping of relations into files to callers. Ok, got it. What we actually have right now is this MarkDirty function that operates on BufferTag data, which points directly to the underlying file. I see that as cutting the storage API in half and calling a function in the middle of the implementation. Well, no, the BufferTag struct is internal to the buffer manager implementation. It's not part of the API; it's an implementation detail of the buffer manager. It strikes me as kind of weird that the read side and write side are not more symmetrical. It might seem weird if you expect the API to be similar to POSIX read() and write(), where you can read() an arbitrary block at any location, and write() an arbitrary block at any location. A better comparison would be e.g open() and close(). open() returns a file descriptor, which you pass to other functions to operate on the file. When you're done, you call close(fd). The file descriptor is completely opaque to the user program, you do all the operations through the functions that take the fd as argument. Similarly, ReadBuffer() returns a Buffer, which is completely opaque to the caller, and you do all the operations through various functions and macros that operate on the Buffer. When you're done, you release the buffer with ReleaseBuffer(). (sorry for the digression from the original topic, I don't have any problem with what adding an optional Relation argument to MarkBuffer if you need that :-) ) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On 5/23/13 7:56 PM, Claudio Freire wrote: Besides of the obvious option of making a lighter check (doesn't have to be 100% precise), wouldn't this check be done when it would otherwise sleep? Is it so heavy still in that context? A commit to typical 7200RPM disk is about 10ms, while autovacuum_vacuum_cost_delay is 20ms. If the statement cost limit logic were no more complicated than commit_delay, it would be feasible to do something similar each time a statement was being put to sleep. I suspect that the cheapest useful thing will be more expensive than commit_delay's test. That's a guess though. I'll have to think about this more when I circle back toward usability. Thanks for the implementation idea. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Thu, May 23, 2013 at 8:46 PM, Greg Smith wrote: > On 5/23/13 7:34 PM, Claudio Freire wrote: >> >> Why not make the delay conditional on the amount of concurrency, kinda >> like the commit_delay? Although in this case, it should only count >> unwaiting connections. > > > The test run by commit_delay is way too heavy to run after every block is > processed. That code is only hit when there's a commit, which already > assumes a lot of overhead is going on--the disk flush to WAL--so burning > some processing/lock acquisition time isn't a big deal. The spot where > statement delay is going is so performance sensitive that everything it > touches needs to be local to the backend. Besides of the obvious option of making a lighter check (doesn't have to be 100% precise), wouldn't this check be done when it would otherwise sleep? Is it so heavy still in that context? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On 5/23/13 7:34 PM, Claudio Freire wrote: Why not make the delay conditional on the amount of concurrency, kinda like the commit_delay? Although in this case, it should only count unwaiting connections. The test run by commit_delay is way too heavy to run after every block is processed. That code is only hit when there's a commit, which already assumes a lot of overhead is going on--the disk flush to WAL--so burning some processing/lock acquisition time isn't a big deal. The spot where statement delay is going is so performance sensitive that everything it touches needs to be local to the backend. For finding cost delayed statements that are causing trouble because they are holding locks, the only place I've thought of that runs infrequently and is poking at the right data is the deadlock detector. Turning that into a more general mechanism for finding priority inversion issues is an interesting idea. It's a bit down the road from what I'm staring at now though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Thu, May 23, 2013 at 8:27 PM, Greg Smith wrote: > The main unintended consequences issue I've found so far is when a cost > delayed statement holds a heavy lock. Autovacuum has some protection > against letting processes with an exclusive lock on a table go to sleep. It > won't be easy to do that with arbitrary statements. There's a certain > amount of allowing the user to shoot themselves in the foot here that will > be time consuming (if not impossible) to eliminate. The person who runs an > exclusive CLUSTER that's limited by statement_cost_delay may suffer from > holding the lock too long. But that might be their intention with setting > the value. Hard to idiot proof this without eliminating useful options too. Why not make the delay conditional on the amount of concurrency, kinda like the commit_delay? Although in this case, it should only count unwaiting connections. That way, if there's a "delay deadlock", the delay gets out of the way. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cost limited statements RFC
I'm working on a new project here that I wanted to announce, just to keep from duplicating effort in this area. I've started to add a cost limit delay for regular statements. The idea is that you set a new statement_cost_delay setting before running something, and it will restrict total resources the same way autovacuum does. I'll be happy with it when it's good enough to throttle I/O on SELECT and CREATE INDEX CONCURRENTLY. Modifying the buffer manager to account for statement-based cost accumulation isn't difficult. The tricky part here is finding the right spot to put the delay at. In the vacuum case, it's easy to insert a call to check for a delay after every block of I/O. It should be possible to find a single or small number of spots to put a delay check in the executor. But I expect that every utility command may need to be modified individually to find a useful delay point. This is starting to remind me of the SEPostgres refactoring, because all of the per-command uniqueness ends up requiring a lot of work to modify in a unified way. The main unintended consequences issue I've found so far is when a cost delayed statement holds a heavy lock. Autovacuum has some protection against letting processes with an exclusive lock on a table go to sleep. It won't be easy to do that with arbitrary statements. There's a certain amount of allowing the user to shoot themselves in the foot here that will be time consuming (if not impossible) to eliminate. The person who runs an exclusive CLUSTER that's limited by statement_cost_delay may suffer from holding the lock too long. But that might be their intention with setting the value. Hard to idiot proof this without eliminating useful options too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add tests for LOCK TABLE
Hi, Please find attached a patch to take code-coverage of LOCK TABLE ( src/backend/commands/lockcmds.c) from 57% to 84%. Any and all feedback is welcome. -- Robins Tharakan regress_lock.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block write statistics WIP
On 5/20/13 7:51 AM, Heikki Linnakangas wrote: The way that MarkDirty requires this specific underlying storage manager behavior to work properly strikes me as as a bit of a layering violation too. I'd like the read and write paths to have a similar API, but here they don't even operate on the same type of inputs. Addressing that is probably harder than just throwing a hack on the existing code though. To be honest, I don't understand what you mean by that. ? Let's say you were designing a storage layer API from scratch for what Postgres does. That might take a relation as its input, like ReadBuffer does. Hiding the details of how that turns into a physical file operation would be a useful goal of such a layer. I'd then consider it a problem if that exposed things like the actual mapping of relations into files to callers. What we actually have right now is this MarkDirty function that operates on BufferTag data, which points directly to the underlying file. I see that as cutting the storage API in half and calling a function in the middle of the implementation. It strikes me as kind of weird that the read side and write side are not more symmetrical. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WARNING : pgstat wait timeout - Postgres 9.1
On Thu, May 23, 2013 at 9:31 PM, Mathieu Guerin wrote: > What are the consequences ? Because this file will be remove if the server > reboot. > Those temporary statistics are stored in global directory when server shuts down, so the risk here would be to lose a portion of this data in the case of a crash, either at PG or at OS level. > If we change the parameter stats_temp_directory is it necessary to reboot > the server ? > No, sending SIGHUP to the server is enough. > When I lauch a SHOW ALL; command, the parameter stats_temp_director is > not here. > You should get it... stats_temp_directory| pg_stat_tmp | Writes temporary statistics files to the specified directory. -- Michael
[HACKERS] Add more regression tests for ALTER OPERATOR FAMILY.. ADD / DROP
Hi, Please find attached a patch to take code-coverage of ALTER OPERATOR FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%. Any and all feedback is welcome. -- Robins Tharakan regress_opclass.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
On Thu, May 23, 2013 at 6:18 AM, German Becker wrote: > Let me describe the process I follow to get to this. What I am doing is > testing a migration from 8.3 to 9.1. They way I plan to do it is the > following. > 1) Create the schema > 2) import the biggest tables, which are not updated,only growing, with COPY > (this is about 35gb of data) > 2)import the small, changing part of the data > > > The target system is 9.1 with streaming relication. > For steps 1 and 2, I set a "restore" configuration, that amongs other things > like more work mem, it sets archive_mode=off and wal_level=minimal (attached > the difference between restore and normal). > The archive_command is just a cp wrapped in a shell script in case I need to > change it. You can not migrate between any major versions with WAL based or streaming replication. Use either full dump/restore or schema only dump/restore plus trigger based replication (londiste, slony) to migrate data. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On 5/14/13 2:13 PM, Greg Smith wrote: It is possible that we are told to put something in the freelist that is already in it; don't screw up the list if so. I don't see where the code does anything to handle that though. What was your intention here? IIRC, the code that pulls from the freelist already deals with the possibility that a block was on the freelist but has since been put to use. If that's the case then there shouldn't be much penalty to adding a block multiple times (at least within reason...) -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Thanks for the response. I have some evidence against an issue in the backup procedure (though I'm not ruling it out). We moved back to taking the backup off of the primary and all errors for all three clusters went away. All of the hardware is the same, OS and postgres versions are largely the same (9.2.3 vs. 9.2.4 in some cases, various patch levels of Cent 6.3 for the OS). The backup code is exactly the same, just pointed at a different set of boxes. Currently I'm just running for a couple of days to ensure that we have viable static backups. After that I'll redo one of the restores from a suspected backup and will post the logs. -David On Thu, May 23, 2013 at 11:26 AM, Robert Haas wrote: > On Tue, May 21, 2013 at 11:59 AM, Benedikt Grundmann > wrote: > > We are seeing these errors on a regular basis on the testing box now. We > > have even changed the backup script to > > shutdown the hot standby, take lvm snapshot, restart the hot standby, > rsync > > the lvm snapshot. It still happens. > > > > We have never seen this before we introduced the hot standby. So we will > > now revert to taking the backups from lvm snapshots on the production > > database. If you have ideas of what else we should try / what > information > > we can give you to debug this let us know and we will try to so. > > > > Until then we will sadly operate on the assumption that the combination > of > > hot standby and "frozen snapshot" backup of it is not production ready. > > I'm pretty suspicious that your backup procedure is messed up in some > way. The fact that you got invalid page headers is really difficult > to attribute to a PostgreSQL bug. A number of the other messages that > you have posted also tend to indicate either corruption, or that WAL > replay has stopped early. It would be interesting to see the logs > from when the clone was first started up, juxtaposed against the later > WAL flush error messages. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] getting rid of freezing
On 2013-05-23 19:51:48 +0200, Andres Freund wrote: > We currently need to make sure we scanned the whole relation and have > frozen everything to have a sensible relfrozenxid for a relation. > > So, what I propose instead is basically: > 1) only vacuum non-all-visible pages, even when doing it for >anti-wraparound > 2) When we can set all-visible guarantee that all tuples on the page are >fully hinted. During recovery do the same, so we don't need to log >all hint bits. >We can do this with only an exclusive lock on the buffer, we don't >need a cleanup lock. > 3) When we cannot mark a page all-visible or we cannot get the cleanup >lock, remember the oldest xmin on that page. We could set all visible >in the former case, but we want the page to be cleaned up sometime >soonish. > 4) If we can get the cleanup lock, purge dead tuples from the page and >the indexes, just as today. Set the page as all-visible. > > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. Heikki noticed that I made quite the omission here which is that you would need to mark tuples as all visible as well. I was thinking about using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WARNING : pgstat wait timeout - stats_temp_directory - postgres 9.1
Mathieu Guerin escribió: > Hello, > > I am facing a problem with pgstat as my subject says. I known, some topics > are open about that, but I would like to go deeper. > > Some person told that the better way to don't have this message anymore is > to configure pgstat.stat to be loaded in the RAM with a tmpfs mount point. > > What are the consequences ? Because this file will be remove if the server > reboot. There are two separate files, one is the temp file which is used while the server is running and is written very frequently. You put that one on volatile storage (stats_temp_directory) and immediately see a performance benefit. The other one is the permanent file, which is written only once when the system is shutting down. This is not put in stats_temp_directory, so it's safe. In case of a crash (the server didn't have the chance to write the permanent file), stats would be reset anyway at restart, so there's no conceptual problem with the permanent file not being written. > If we change the parameter stats_temp_directory is it necessary to reboot > the server ? No, a reload (pg_ctl reload) is sufficient. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gemulon.postgresql.org/gitmaster.postgresql.org
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/23/2013 12:20 PM, Stefan Kaltenbrunner wrote: > Hi All! > > > We will be upgrading gemulon.postgresql.org during the next few > hours to the current release of debian (wheezy/7.0) as discussed > with various people. To prevent any kind of issues we will be > locking out commiters for a brief amount of time so don't be > surprised if you get an error message. all done - happy commiting Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlGeXicACgkQr1aG+WhhYQEgpQCgt/QNu0YS3AtFun0xwi017Dza J78AmwQ71DRH6SOqIanBz9AdGe/0xGof =mVe8 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 2013-05-23 19:51:48 +0200, Andres Freund wrote: > I think that the existence of hint bits and the crash safe visibility > maps should provide sufficient tooling to make freezing unneccessary > without loosing much information for debugging if we modify the way > vacuum works a bit. > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. One case that would make this problematic is row level locks on tuples. We would need to unset all visible for them, otherwise we might do the wrong thing when looking at xmax... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > On 23.05.2013 07:55, Robert Haas wrote: > >> On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas >> wrote: >> >>> 1. Scan the WAL log of the old cluster, starting from the point where >>> the new cluster's timeline history forked off from the old cluster. For >>> each >>> WAL record, make a note of the data blocks that are touched. This yields >>> a >>> list of all the data blocks that were changed in the old cluster, after >>> the >>> new cluster forked off. >>> >> >> Suppose that a transaction is open and has written tuples at the point >> where WAL forks. After WAL forks, the transaction commits. Then, it >> hints some of the tuples that it wrote. There is no record in WAL >> that those blocks are changed, but failing to revert them leads to >> data corruption. >> > > Bummer, you're right. Hmm, if you have checksums enabled, however, we'll > WAL log a full-page every time a page is dirtied for setting a hint bit, > which fixes the problem. So, there's a caveat with pg_rewind; you must have > checksums enabled. > > I was quite impressed with the idea, but hint bits indeed are problem. I realised the same issue also applies to the other idea that Fujii-san and others have suggested about waiting for dirty buffers to be written until the WAL is received at the standby. But since that idea would anyways need to be implemented in the core, we could teach SetHintBits() to return false unless the corresponding commit WAL records are written to the standby first. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
[HACKERS] getting rid of freezing
Hi, after having discussed $subject shortly over dinner yesterday, while I should have been preparing the slides for my talk I noticed that there might be a rather easy way to get rid of freezing. I think that the existence of hint bits and the crash safe visibility maps should provide sufficient tooling to make freezing unneccessary without loosing much information for debugging if we modify the way vacuum works a bit. Currently, aside from recovery, we only set all visible in vacuum. vacuumlazy.c's lazy_scan_heap currently works like: for (blkno = 0; blkno < nblocks; blkno++) { if (!scan_all && invisible) continue; /* cannot lock buffer immediately */ if (!ConditionalLockBufferForCleanup(buf)) { if (!scan_all) continue; /* don't block if we don't need freezing */ if (!lazy_check_needs_freeze(buf)) continue; /* now wait for cleanup lock */ LockBufferForCleanup(buf); } for (tuple in all_tuples) { cleanup_tuple(); } if (nfrozen > 0) log_heap_freeze() if (all_visible) { PageSetAllVisible(page); visibilitymap_set(page); } } In other words, if we don't need to make sure there aren't any old tuples, we only scan visible parts of the relation. If we are making a freeze vacuum we scan the whole relation, waiting for a cleanup lock on the relation if necessary. We currently need to make sure we scanned the whole relation and have frozen everything to have a sensible relfrozenxid for a relation. So, what I propose instead is basically: 1) only vacuum non-all-visible pages, even when doing it for anti-wraparound 2) When we can set all-visible guarantee that all tuples on the page are fully hinted. During recovery do the same, so we don't need to log all hint bits. We can do this with only an exclusive lock on the buffer, we don't need a cleanup lock. 3) When we cannot mark a page all-visible or we cannot get the cleanup lock, remember the oldest xmin on that page. We could set all visible in the former case, but we want the page to be cleaned up sometime soonish. 4) If we can get the cleanup lock, purge dead tuples from the page and the indexes, just as today. Set the page as all-visible. That way we know that any page that is all-visible doesn't ever need to look at xmin/xmax since we are sure to have set all relevant hint bits. We don't even necessarily need to log the hint bits for all items since the redo for all_visible could make sure all items are hinted. The only problem is knowing up to where we can truncate pg_clog... Makes sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 23.05.2013 08:03, Simon Riggs wrote: On 23 May 2013 12:10, Heikki Linnakangas wrote: Please take a look: https://github.com/vmware/pg_rewind The COPYRIGHT file shows that VMware is claiming copyright on unstated parts of the code for this. As such, its not a normal submission to the PostgreSQL project, which involves placing copyright with the PGDG. We have a lot of code in PostgreSQL source tree with different copyright notices, and there's no problem with that as long as the coe is licensed under the PostgreSQL license. For patches that add or modify code in PostgreSQL, we generally have copyright notices with just PGDG, to avoid having a long list of copyright notices of a lot of companies and individuals on every file. I'm no lawyer, but I believe there's no difference from the legal point of view. As a result, while it sounds interesting, people should be aware of that and I suggest we shouldn't discuss that code on this list, to avoid any disputes should we decide to include a similar facility in core Postgres in the future. That's just paranoia. There are a lot of tools out there on pgfoundry, with various copyright holders and even difference licenses, and it's fine to talk about all those on this list. Besides, the code is licensed under the PostgreSQL license, so if someone decides we should have this e.g in contrib, you can just grab the sources and commit. Thirdly, there's no reason to refrain from even discussing this, even if someone would include a similar facility in core Postgres - this is about copyrights, not patents (and yes, this contribution has been cleared by VMware legal department; VMware doesn't hold any patents on this) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 23.05.2013 07:55, Robert Haas wrote: On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas wrote: 1. Scan the WAL log of the old cluster, starting from the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that are touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off. Suppose that a transaction is open and has written tuples at the point where WAL forks. After WAL forks, the transaction commits. Then, it hints some of the tuples that it wrote. There is no record in WAL that those blocks are changed, but failing to revert them leads to data corruption. Bummer, you're right. Hmm, if you have checksums enabled, however, we'll WAL log a full-page every time a page is dirtied for setting a hint bit, which fixes the problem. So, there's a caveat with pg_rewind; you must have checksums enabled. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] gemulon.postgresql.org/gitmaster.postgresql.org
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi All! We will be upgrading gemulon.postgresql.org during the next few hours to the current release of debian (wheezy/7.0) as discussed with various people. To prevent any kind of issues we will be locking out commiters for a brief amount of time so don't be surprised if you get an error message. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlGeQaYACgkQr1aG+WhhYQH4PACgncD04Mlo+sC27UROsnRkVo3e NuEAoM/3U5QGt/TETG5f9OjXEdfATd+w =zNvy -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thu, May 23, 2013 at 11:34 AM, Fujii Masao wrote: > On Thu, May 23, 2013 at 8:55 PM, Robert Haas wrote: >> On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas >> wrote: >>> 1. Scan the WAL log of the old cluster, starting from the point where >>> the new cluster's timeline history forked off from the old cluster. For each >>> WAL record, make a note of the data blocks that are touched. This yields a >>> list of all the data blocks that were changed in the old cluster, after the >>> new cluster forked off. >> >> Suppose that a transaction is open and has written tuples at the point >> where WAL forks. After WAL forks, the transaction commits. Then, it >> hints some of the tuples that it wrote. There is no record in WAL >> that those blocks are changed, but failing to revert them leads to >> data corruption. > > Yes in asynchronous replication case. But in synchronous replication case, > after WAL forks, hint bits would not be set if their corresponding commit > record > is not replicated to the standby. The transaction commit keeps waiting > for the reply > from the standby before updating clog. So, this data corruption would not > happen > in sync case. Not necessarily. SyncRepWaitForLSN() can be interrupted via a cancel signal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
> Right, the actual signal handler will only kill the query immediately > if the backend is in a safe state (eg, while it's waiting for a > heavyweight lock). Otherwise it just sets a flag that's checked by > CHECK_FOR_INTERRUPTS. See StatementCancelHandler in postgres.c. Roger that, I will definitely have a look and ask if I have any questions. Thanks a ton! Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: > Right.I believe this is part of the standard way in which we handle > interrupts,right? Making sure that we cancel a query when the backend > is in a state to do so,not when the interrupt actually comes in,right? Right, the actual signal handler will only kill the query immediately if the backend is in a safe state (eg, while it's waiting for a heavyweight lock). Otherwise it just sets a flag that's checked by CHECK_FOR_INTERRUPTS. See StatementCancelHandler in postgres.c. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
On Thu, May 23, 2013 at 10:43 AM, Atri Sharma wrote: >> >> A little bit --- the timeout won't actually kill the query until the >> next time control reaches a CHECK_FOR_INTERRUPTS macro that's not inside >> a critical section. We've had issues in the past with particular code >> paths that failed to include such a check in a long-running loop, and >> there might still be some left. But by and large, it won't take very >> long for the query to notice the interrupt. > > > Right.I believe this is part of the standard way in which we handle > interrupts,right? Making sure that we cancel a query when the backend > is in a state to do so,not when the interrupt actually comes in,right? yes. all non trivial parts of the code (in terms of time) must run the interrupt check. it basically just looks a the signal flag set by the signal handler. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
> > A little bit --- the timeout won't actually kill the query until the > next time control reaches a CHECK_FOR_INTERRUPTS macro that's not inside > a critical section. We've had issues in the past with particular code > paths that failed to include such a check in a long-running loop, and > there might still be some left. But by and large, it won't take very > long for the query to notice the interrupt. Right.I believe this is part of the standard way in which we handle interrupts,right? Making sure that we cancel a query when the backend is in a state to do so,not when the interrupt actually comes in,right? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: > I am not sure, but does statement_timeout depend on *what* the query > is doing internally(i.e. if it is holding lots of locks,pins etc)? A little bit --- the timeout won't actually kill the query until the next time control reaches a CHECK_FOR_INTERRUPTS macro that's not inside a critical section. We've had issues in the past with particular code paths that failed to include such a check in a long-running loop, and there might still be some left. But by and large, it won't take very long for the query to notice the interrupt. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small typo in syncrep.h
On Thu, May 23, 2013 at 4:44 AM, Pavan Deolasee wrote: > While reading code, I noticed a small typo in syncrep.h. It says > SyncRepUpdateSyncStandbysDefined() is called by wal writer whereas its > called by the checkpointer process, at least in the HEAD. Previously, it was > being called by the bgwriter process. > > Attached patches fix this in the relevant releases. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thu, May 23, 2013 at 8:55 PM, Robert Haas wrote: > On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas > wrote: >> 1. Scan the WAL log of the old cluster, starting from the point where >> the new cluster's timeline history forked off from the old cluster. For each >> WAL record, make a note of the data blocks that are touched. This yields a >> list of all the data blocks that were changed in the old cluster, after the >> new cluster forked off. > > Suppose that a transaction is open and has written tuples at the point > where WAL forks. After WAL forks, the transaction commits. Then, it > hints some of the tuples that it wrote. There is no record in WAL > that those blocks are changed, but failing to revert them leads to > data corruption. Yes in asynchronous replication case. But in synchronous replication case, after WAL forks, hint bits would not be set if their corresponding commit record is not replicated to the standby. The transaction commit keeps waiting for the reply from the standby before updating clog. So, this data corruption would not happen in sync case. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
On Thu, May 23, 2013 at 8:52 PM, Amit Langote wrote: >> >> If you let an uncooperative user issue arbitrary SQL queries, he can >> do any number of things to put server performance into the tank. >> For instance, take out exclusive locks on all your tables and just >> go to sleep (although I think this is limited by table permissions in >> recent PG versions). Or start up an unconstrained join on some giant >> tables. etc. etc. This isn't an area that people have felt deserved >> adding a lot of overhead to control. > > In such a case, would statement_timeout apply? If using > statement_timeout, would the longest a client can stall server be > limited to statement_timeout amount of time? > I am not sure, but does statement_timeout depend on *what* the query is doing internally(i.e. if it is holding lots of locks,pins etc)? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
On Tue, May 21, 2013 at 11:59 AM, Benedikt Grundmann wrote: > We are seeing these errors on a regular basis on the testing box now. We > have even changed the backup script to > shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync > the lvm snapshot. It still happens. > > We have never seen this before we introduced the hot standby. So we will > now revert to taking the backups from lvm snapshots on the production > database. If you have ideas of what else we should try / what information > we can give you to debug this let us know and we will try to so. > > Until then we will sadly operate on the assumption that the combination of > hot standby and "frozen snapshot" backup of it is not production ready. I'm pretty suspicious that your backup procedure is messed up in some way. The fact that you got invalid page headers is really difficult to attribute to a PostgreSQL bug. A number of the other messages that you have posted also tend to indicate either corruption, or that WAL replay has stopped early. It would be interesting to see the logs from when the clone was first started up, juxtaposed against the later WAL flush error messages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
> No, you're not following. These would be table-level heavyweight locks, > and there already are timeout mechanisms that work at that level. > Oh,right.Sorry,I missed that. I will go and research heavyweight locks now. Thanks a ton! Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
> > If you let an uncooperative user issue arbitrary SQL queries, he can > do any number of things to put server performance into the tank. > For instance, take out exclusive locks on all your tables and just > go to sleep (although I think this is limited by table permissions in > recent PG versions). Or start up an unconstrained join on some giant > tables. etc. etc. This isn't an area that people have felt deserved > adding a lot of overhead to control. In such a case, would statement_timeout apply? If using statement_timeout, would the longest a client can stall server be limited to statement_timeout amount of time? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: >> For instance, take out exclusive locks on all your tables and just >> go to sleep (although I think this is limited by table permissions in >> recent PG versions). > This is what I have been worried about. The locks(exclusive locks or > content locks in buffers) can be held and stalled. Cant we do anything > about it? This is why I contemplated the timeout part. No, you're not following. These would be table-level heavyweight locks, and there already are timeout mechanisms that work at that level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Tue, May 21, 2013 at 3:06 AM, Amit Kapila wrote: >> Here are the results. The first field in each line is the number of >> clients. The second number is the scale factor. The numbers after >> "master" and "patched" are the median of three runs. >> >> 01 100 master 1433.297699 patched 1420.306088 >> 01 300 master 1371.286876 patched 1368.910732 >> 01 1000 master 1056.891901 patched 1067.341658 >> 01 3000 master 637.312651 patched 685.205011 >> 08 100 master 10575.017704 patched 11456.043638 >> 08 300 master 9262.601107 patched 9120.925071 >> 08 1000 master 1721.807658 patched 1800.733257 >> 08 3000 master 819.694049 patched 854.333830 >> 32 100 master 26981.677368 patched 27024.507600 >> 32 300 master 14554.870871 patched 14778.285400 >> 32 1000 master 1941.733251 patched 1990.248137 >> 32 3000 master 846.654654 patched 892.554222 > > Is the above test for tpc-b? > In the above tests, there is performance increase from 1~8% and decrease > from 0.2~1.5% It's just the default pgbench workload. >> And here's the same results for 5-minute, read-only tests: >> >> 01 100 master 9361.073952 patched 9049.553997 >> 01 300 master 8640.235680 patched 8646.590739 >> 01 1000 master 8339.364026 patched 8342.799468 >> 01 3000 master 7968.428287 patched 7882.121547 >> 08 100 master 71311.491773 patched 71812.899492 >> 08 300 master 69238.839225 patched 70063.632081 >> 08 1000 master 34794.778567 patched 65998.468775 >> 08 3000 master 60834.509571 patched 61165.998080 >> 32 100 master 203168.264456 patched 205258.283852 >> 32 300 master 199137.276025 patched 200391.633074 >> 32 1000 master 177996.853496 patched 176365.732087 >> 32 3000 master 149891.147442 patched 148683.269107 >> >> Something appears to have screwed up my results for 8 clients @ scale >> factor 300 on master, > > Do you want to say the reading of 1000 scale factor? Yes. >>but overall, on both the read-only and >> read-write tests, I'm not seeing anything that resembles the big gains >> you reported. > > I have not generated numbers for read-write tests, I will check that once. > For read-only tests, the performance increase is minor and different from > what I saw. > Few points which I could think of for difference in data: > > 1. In my test's I always observed best data when number of clients/threads > are equal to number of cores which in your case should be at 16. Sure, but you also showed substantial performance increases across a variety of connection counts, whereas I'm seeing basically no change at any connection count. > 2. I think for scale factor 100 and 300, there should not be much > performance increase, as for them they should mostly get buffer from > freelist inspite of even bgwriter adds to freelist or not. I agree. > 3. In my tests variance is for shared buffers, database size is always less > than RAM (Scale Factor -1200, approx db size 16~17GB, RAM -24 GB), but due > to variance in shared buffers, it can lead to I/O. Not sure I understand this. > 4. Each run is of 20 minutes, not sure if this has any difference. I've found that 5-minute tests are normally adequate to identify performance changes on the pgbench SELECT-only workload. >> Tests were run on a 16-core, 64-hwthread PPC64 machine provided to the >> PostgreSQL community courtesy of IBM. Fedora 16, Linux kernel 3.2.6. > > To think about the difference in your and my runs, could you please tell me > about below points > 1. What is RAM in machine. 64GB > 2. Are number of threads equal to number of clients. Yes. > 3. Before starting tests I have always done pre-warming of buffers (used > pg_prewarm written by you last year), is it same for above read-only tests. No, I did not use pg_prewarm. But I don't think that should matter very much. First, the data was all in the OS cache. Second, on the small scale factors, everything should end up in cache pretty quickly anyway. And on the large scale factors, well, you're going to be churning shared_buffers anyway, so pg_prewarm is only going to affect the very beginning of the test. > 4. Can you please once again run only the test where you saw variation(8 > clients @ scale> factor 1000 on master), because I have also seen that > performance difference is very good for certain >configurations(Scale Factor, RAM, Shared Buffers) I can do this if I get a chance, but I don't really see where that's going to get us. It seems pretty clear to me that there's no benefit on these tests from this patch. So either one of us is doing the benchmarking incorrectly, or there's some difference in our test environments that is significant, but none of the proposals you've made so far seem to me to explain the difference. > Apart from above, I had one more observation during my investigation to find > why in some cases, there is a small dip: > 1. Many times, it finds the buffer in free list is not usable, means it's > refcount or usage count is not zero, due to which it had to spend more time > under BufFreelistLock. >
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
> For instance, take out exclusive locks on all your tables and just > go to sleep (although I think this is limited by table permissions in > recent PG versions). This is what I have been worried about. The locks(exclusive locks or content locks in buffers) can be held and stalled. Cant we do anything about it? This is why I contemplated the timeout part. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: > BTW, what is your opinion on a rogue client's damaging capabilities? > Theoretically, what can a client which tries to stall the backend > target? If you let an uncooperative user issue arbitrary SQL queries, he can do any number of things to put server performance into the tank. For instance, take out exclusive locks on all your tables and just go to sleep (although I think this is limited by table permissions in recent PG versions). Or start up an unconstrained join on some giant tables. etc. etc. This isn't an area that people have felt deserved adding a lot of overhead to control. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
On Thu, May 23, 2013 at 8:18 PM, Tom Lane wrote: > Atri Sharma writes: >> I was musing over a possible condition where a rogue client gets the >> backend to process queries which take a *lot* of time(note, this is >> only in my head atm.I may be completely wrong here). > >> Wouldnt something on the lines of a timeout help here? > > You can already set statement_timeout for that. I don't think worrying > about it at the level of buffer content locks would be terribly helpful, > since those locks are generally held only for long enough to read or > write the page or to verify the visibility of rows on it. Even if the > client is rogue, it can't affect those timings too much. Right. I seem to be understanding this now. BTW, what is your opinion on a rogue client's damaging capabilities? Theoretically, what can a client which tries to stall the backend target? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: > I was musing over a possible condition where a rogue client gets the > backend to process queries which take a *lot* of time(note, this is > only in my head atm.I may be completely wrong here). > Wouldnt something on the lines of a timeout help here? You can already set statement_timeout for that. I don't think worrying about it at the level of buffer content locks would be terribly helpful, since those locks are generally held only for long enough to read or write the page or to verify the visibility of rows on it. Even if the client is rogue, it can't affect those timings too much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
On Thu, May 23, 2013 at 8:01 PM, Tom Lane wrote: > Atri Sharma writes: >> On a different note, shouldn't we have a time out for a content lock >> in buffer cache? > > No; the overhead of setting up and canceling such a timeout would > greatly outweigh any possible benefit. > > Generally speaking, LWLocks are not meant to be used in situations where > the lock hold time might be long enough to justify worrying about > timeouts. If you need that kind of behavior, use a heavyweight lock. Right, the overheads,especially in case of interruptions would be high. I was musing over a possible condition where a rogue client gets the backend to process queries which take a *lot* of time(note, this is only in my head atm.I may be completely wrong here). Wouldnt something on the lines of a timeout help here? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Atri Sharma writes: > On a different note, shouldn't we have a time out for a content lock > in buffer cache? No; the overhead of setting up and canceling such a timeout would greatly outweigh any possible benefit. Generally speaking, LWLocks are not meant to be used in situations where the lock hold time might be long enough to justify worrying about timeouts. If you need that kind of behavior, use a heavyweight lock. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ASYNC Privileges proposal
Hey all, I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. If the payloads of notifications could be trusted, then applications could make better use of them, without fear of leaking any sensitive information to anyone who shouldn't be able to see it. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ie: GRANT ASYNC ON DATABASE TO bob; REVOKE ASYNC ON DATABASE FROM bob; SECURITY DEFINER functions could then be used anywhere that a finer grained access control was required. I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc Chris. async_privileges_r0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 23 May 2013 07:10, Heikki Linnakangas wrote: > Hi, > > I've been hacking on a tool to allow resynchronizing an old master server > after failover. The need to do a full backup/restore has been a common > complaint ever since we've had streaming replication. I saw on the wiki that > this was discussed in the dev meeting; too bad I couldn't make it. > > In a nutshell, the idea is to do copy everything that has changed between > the cluster, like rsync does, but instead of reading through all files, use > the WAL to determine what has changed. Here's a somewhat more detailed > explanation, from the README: > > Theory of operation > --- > > The basic idea is to copy everything from the new cluster to old, except for > the blocks that we know to be the same. > > 1. Scan the WAL log of the old cluster, starting from the point where > the new cluster's timeline history forked off from the old cluster. For each > WAL record, make a note of the data blocks that are touched. This yields a > list of all the data blocks that were changed in the old cluster, after the > new cluster forked off. > > 2. Copy all those changed blocks from the new master to the old master. > > 3. Copy all other files like clog, conf files etc. from the new cluster > to old. Everything except the relation files. > > 4. Apply the WAL from the new master, starting from the checkpoint > created at failover. (pg_rewind doesn't actually apply the WAL, it just > creates a backup label file indicating that when PostgreSQL is started, it > will start replay from that checkpoint and apply all the required WAL) > > > Please take a look: https://github.com/vmware/pg_rewind 6 instances set up: [Primary (5530)] | ---[Standby 1 (5531)] | ---[Standby 2 (5532)] | ---[Standby 3 (5533)] | ---[Standby 4 (5534)] | ---[Standby 5 (5535)] 1) Created a table on the primary with some data. 2) Promoted Standby 1 3) Cleanly shut down Primary 4) pg_rewind --target-pgdata=/tmp/primary --source-server='host=localhost port=5531 dbname=postgres' Last common WAL position: 0/30227F8 on timeline 1 Last common checkpoint at 0/30227F8 on timeline 1 error reading xlog record: record with zero length at 0/3022860 Done! Contents of pg_xlog directory in Primary and Standby 1: thom@swift /tmp $ ls -l primary/pg_xlog/ total 49156 -rw--- 1 thom users 16777216 May 23 09:52 00010002 -rw--- 1 thom users 16777216 May 23 09:52 00010003 -rw--- 1 thom users 16777216 May 23 09:52 00020003 -rw--- 1 thom users 41 May 23 09:52 0002.history drwx-- 2 thom users 80 May 23 09:52 archive_status thom@swift /tmp $ ls -l standby1/pg_xlog/ total 49156 -rw--- 1 thom users 16777216 May 23 09:49 00010002 -rw--- 1 thom users 16777216 May 23 09:50 00010003 -rw--- 1 thom users 16777216 May 23 09:52 00020003 -rw--- 1 thom users 41 May 23 09:50 0002.history drwx-- 2 thom users 80 May 23 09:50 archive_status 5) Changed recovery.done in primary to point its primary_conninfo port to 5531 (that of Standby 1). 6) Renamed it to .conf. 7) Changed postgresql.conf to set the port back to its original one (as pg_rewind has caused it to match that of Standby 1) 8) Start Primary Latest log in primary reads: LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT LOG: entering standby mode LOG: invalid xl_info in checkpoint record FATAL: could not locate required checkpoint record HINT: If you are not restoring from a backup, try removing the file "/tmp/primary/backup_label". LOG: startup process (PID 31503) exited with exit code 1 LOG: aborting startup due to startup process failure 9) Okay, so I'll delete that label and try again. Now all is well: LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT LOG: entering standby mode LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/3022828 LOG: record with zero length at 0/3041A60 LOG: consistent recovery state reached at 0/3041A60 LOG: database system is ready to accept read only connections LOG: started streaming WAL from primary at 0/300 on timeline 2 10) Connect to Standby 1 and insert more rows into our original table. 11) Connect to Primary and those rows are appearing. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
On Thu, May 23, 2013 at 5:29 AM, Sergey Konoplev wrote: > On Thu, May 23, 2013 at 1:25 AM, Amit Langote > wrote: > > Okay, now I understand. Also, looking at his "ls -l pg_xlog", I could > > find that modified timestamps of all those pre-allocated segments are > > about similar (around 12:10), whereas the latest modified time (15:37) > > is of segment 0001000E00A7. > > > > Wonder if whatever configuration he is using is sub-optimal that these > > many WAL segments can be re-cycled upon a checkpoint? Or is this okay? > > Is archive_mode=on? > What is archive_command? > Is the server in the recovery mode? > > -- > Kind regards, > Sergey Konoplev > PostgreSQL Consultant and DBA > > Profile: http://www.linkedin.com/in/grayhemp > Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 > Skype: gray-hemp > Jabber: gray...@gmail.com > Hi Sergey and all, Let me describe the process I follow to get to this. What I am doing is testing a migration from 8.3 to 9.1. They way I plan to do it is the following. 1) Create the schema 2) import the biggest tables, which are not updated,only growing, with COPY (this is about 35gb of data) 2)import the small, changing part of the data The target system is 9.1 with streaming relication. For steps 1 and 2, I set a "restore" configuration, that amongs other things like more work mem, it sets archive_mode=off and wal_level=minimal (attached the difference between restore and normal). The archive_command is just a cp wrapped in a shell script in case I need to change it. Let me know if you need any more info restorevsnormal.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MVCC catalog access
On Wed, May 22, 2013 at 11:11 PM, Andres Freund wrote: > Make that actually having acquired an xid. We skip a large part of the > work if a transaction doesn't yet have one afair. I don't think the mere > presence of 600 idle connections without an xid in contrast to just > having max_connection at 600 should actually make a difference in the > cost of acquiring a snapshot? Attached is a slightly updated version of the patch I'm using for testing, and an updated version of the pg_cxn source that I'm using to open lotsa connections. With this version, I can do this: ./pg_cxn -n 600 -c BEGIN -c 'SELECT txid_current()' ...which I think is sufficient to make sure all those transactions have XIDs. Then I reran the "depend" test case (create a schema with 1000,000 functions and then drop the schema with CASCADE) that I mentioned in my original posting. Here are the results: MVCC Off: Create 8685.662 ms, Drop 9973.233 ms MVCC On: Create 7931.039 ms, Drop 10189.189 ms MVCC Off: Create 7810.084 ms, Drop 9594.580 ms MVCC On: Create 8854.577 ms, Drop 10240.024 ms OK, let's try the rebuild-the-relcache test using the same pg_cxn scenario (600 transactions that have started a transaction and selected txid_current()). [rhaas ~]$ time for s in `seq 1 1000`; do rm -f pgdata/global/pg_internal.init && psql -c 'SELECT 2+2' >/dev/null; done MVCC catalog access on: real0m11.006s user0m2.746s sys 0m2.664s MVCC catalog access off: real0m10.583s user0m2.745s sys 0m2.661s MVCC catalog access on: real0m10.646s user0m2.750s sys 0m2.661s MVCC catalog access off: real0m10.823s user0m2.756s sys 0m2.681s -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company mvcc-catalog-access-v2.patch Description: Binary data /* * pg_cxn.c */ #include #include #include #include "libpq-fe.h" struct cmd_list; typedef struct cmd_list cmd_list; struct cmd_list { char *cmd; cmd_list *next; }; static void pg_connect(const char *conninfo, cmd_list *); static cmd_list *add_command(cmd_list *, char *); static void usage(void); int main(int argc, char **argv) { int c; int n = 1; int optindex; int i; const char *conninfo; cmd_list *cmds; while ((c = getopt_long(argc, argv, "n:c:", NULL, &optindex)) != -1) { switch (c) { case 'n': n = atoi(optarg); break; case 'c': cmds = add_command(cmds, optarg); break; default: usage(); break; } } argv += optind; argc -= optind; if (argc > 0) conninfo = argv[0]; else conninfo = ""; for (i = 0; i < n; ++i) pg_connect(conninfo, cmds); printf("Established %d connections.\n", n); while (1) sleep(3600); return 0; } static cmd_list * add_command(cmd_list *cmds, char *cmd) { cmd_list *newnode; newnode = malloc(sizeof(cmd_list)); if (newnode == NULL) { perror("malloc"); exit(1); } newnode->cmd = cmd; newnode->next = cmds; return newnode; } static void pg_connect(const char *conninfo, cmd_list *cmds) { PGconn *conn; /* Make a connection to the database */ conn = PQconnectdb(conninfo); /* Check to see that the backend connection was successfully made */ if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "%s", PQerrorMessage(conn)); exit(1); } /* Execute commands. */ while (cmds != NULL) { PGresult *res; res = PQexec(conn, cmds->cmd); if (PQresultStatus(res) == PGRES_FATAL_ERROR) { fprintf(stderr, "%s", PQresultErrorMessage(res)); exit(1); } if (PQresultStatus(res) != PGRES_COMMAND_OK && PQresultStatus(res) != PGRES_TUPLES_OK) { fprintf(stderr, "unexpected result status: %s\n", PQresStatus(PQresultStatus(res))); exit(1); } PQclear(res); cmds = cmds->next; } } static void usage() { fprintf(stderr, "Usage: pg_cxn [OPTION] [CONNECTION-STRING]\n\n" "Options:\n" " -n NUM Number of connections to open.\n" " -c SQL SQL to execute on each connection.\n" "(You can repeat this option more than once.)\n"); exit(1); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WARNING : pgstat wait timeout - Postgres 9.1
Hello, I am facing a problem with pgstat as my subject says. I known, some topics are open about that, but I would like to go deeper. Some person told that the better way to don't have this message anymore is to configure pgstat.stat to be loaded in the RAM with a tmpfs mount point. What are the consequences ? Because this file will be remove if the server reboot. If we change the parameter stats_temp_directory is it necessary to reboot the server ? When I lauch a SHOW ALL; command, the parameter stats_temp_director is not here. For information, my pgstat.stat file is up to 1,3MB. Thank you for your help. Math
Re: [HACKERS] Heap truncation without AccessExclusiveLock (9.4)
On 15 May 2013 16:35, Heikki Linnakangas wrote: > Shared memory space is limited, but we only need the watermarks for any > in-progress truncations. Let's keep them in shared memory, in a small > fixed-size array. That limits the number of concurrent truncations that can > be in-progress, but that should be ok. To not slow down common backend > operations, the values (or lack thereof) are cached in relcache. To sync the > relcache when the values change, there will be a new shared cache > invalidation event to force backends to refresh the cached watermark values. > A backend (vacuum) can ensure that all backends see the new value by first > updating the value in shared memory, sending the sinval message, and waiting > until everyone has received it. I think we could use a similar scheme for 2 other use cases. 1. Unlogged tables. It would be useful to have a persistent "safe high watermark" for an unlogged table. So in the event of a crash, we truncate back to the safe high watermark, not truncate the whole table. That would get updated at each checkpoint. Unlogged tables will get much more useful with that change. (Issues, with indexes would need to be resolved also). 2. Table extension during COPY operations is difficult. We need to be able to extend in larger chunks, so we would need to change the algorithm about how extension works. I'm thinking there's a relationship there with watermarks. Can we look at the needs of multiple areas at once, so we come up with a more useful design that covers more than just one use case, please? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 23 May 2013 12:10, Heikki Linnakangas wrote: > Please take a look: https://github.com/vmware/pg_rewind The COPYRIGHT file shows that VMware is claiming copyright on unstated parts of the code for this. As such, its not a normal submission to the PostgreSQL project, which involves placing copyright with the PGDG. As a result, while it sounds interesting, people should be aware of that and I suggest we shouldn't discuss that code on this list, to avoid any disputes should we decide to include a similar facility in core Postgres in the future. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas wrote: > 1. Scan the WAL log of the old cluster, starting from the point where > the new cluster's timeline history forked off from the old cluster. For each > WAL record, make a note of the data blocks that are touched. This yields a > list of all the data blocks that were changed in the old cluster, after the > new cluster forked off. Suppose that a transaction is open and has written tuples at the point where WAL forks. After WAL forks, the transaction commits. Then, it hints some of the tuples that it wrote. There is no record in WAL that those blocks are changed, but failing to revert them leads to data corruption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Sent from my iPad On 23-May-2013, at 16:27, Amit Kapila wrote: > On Thursday, May 23, 2013 3:35 PM Atri Sharma wrote: >> Hi all, >> >> I was searching for an existing mechanism which ensures that a crashed >> process releases a content lock held on a page in buffer cache. Also, >> a similar mechanism for refcount of the page. >> >> I seem to have missed it. Do we have a mechanism,and if yes,where can I >> find it? > > On any process (backend, bgwriter,checkpointer, ..) crash, it reinitialize > the shared memory which will clear any locks held. > Please refer functions HandleChildCrash() and PostmasterStateMachine(). > > Thanks a ton. On a different note, shouldn't we have a time out for a content lock in buffer cache? There is a limit to the time any sane process would want to hold a lock,especially when many tasks can be accomplish with only pinning the page. I am just musing here,but it seems that we could potentially analyse this.I may be completely wrong. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
Hi, I've been hacking on a tool to allow resynchronizing an old master server after failover. The need to do a full backup/restore has been a common complaint ever since we've had streaming replication. I saw on the wiki that this was discussed in the dev meeting; too bad I couldn't make it. In a nutshell, the idea is to do copy everything that has changed between the cluster, like rsync does, but instead of reading through all files, use the WAL to determine what has changed. Here's a somewhat more detailed explanation, from the README: Theory of operation --- The basic idea is to copy everything from the new cluster to old, except for the blocks that we know to be the same. 1. Scan the WAL log of the old cluster, starting from the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that are touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off. 2. Copy all those changed blocks from the new master to the old master. 3. Copy all other files like clog, conf files etc. from the new cluster to old. Everything except the relation files. 4. Apply the WAL from the new master, starting from the checkpoint created at failover. (pg_rewind doesn't actually apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL) Please take a look: https://github.com/vmware/pg_rewind - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time limit for a process to hold Content lock in Buffer Cache
On Thursday, May 23, 2013 3:35 PM Atri Sharma wrote: > Hi all, > > I was searching for an existing mechanism which ensures that a crashed > process releases a content lock held on a page in buffer cache. Also, > a similar mechanism for refcount of the page. > > I seem to have missed it. Do we have a mechanism,and if yes,where can I > find it? On any process (backend, bgwriter,checkpointer, ..) crash, it reinitialize the shared memory which will clear any locks held. Please refer functions HandleChildCrash() and PostmasterStateMachine(). With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WARNING : pgstat wait timeout - stats_temp_directory - postgres 9.1
Hello, I am facing a problem with pgstat as my subject says. I known, some topics are open about that, but I would like to go deeper. Some person told that the better way to don't have this message anymore is to configure pgstat.stat to be loaded in the RAM with a tmpfs mount point. What are the consequences ? Because this file will be remove if the server reboot. If we change the parameter stats_temp_directory is it necessary to reboot the server ? When I lauch a SHOW ALL; command, the parameter stats_temp_director is not here. For information, my pgstat.stat file is up to 1,3MB. Thank you for your help. Math
[HACKERS] Time limit for a process to hold Content lock in Buffer Cache
Hi all, I was searching for an existing mechanism which ensures that a crashed process releases a content lock held on a page in buffer cache. Also, a similar mechanism for refcount of the page. I seem to have missed it. Do we have a mechanism,and if yes,where can I find it? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small typo in syncrep.h
Hello, While reading code, I noticed a small typo in syncrep.h. It says SyncRepUpdateSyncStandbysDefined() is called by wal writer whereas its called by the checkpointer process, at least in the HEAD. Previously, it was being called by the bgwriter process. Attached patches fix this in the relevant releases. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee syncrep-typo-91.patch Description: Binary data syncrep-typo-head.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
On Thu, May 23, 2013 at 1:25 AM, Amit Langote wrote: > Okay, now I understand. Also, looking at his "ls -l pg_xlog", I could > find that modified timestamps of all those pre-allocated segments are > about similar (around 12:10), whereas the latest modified time (15:37) > is of segment 0001000E00A7. > > Wonder if whatever configuration he is using is sub-optimal that these > many WAL segments can be re-cycled upon a checkpoint? Or is this okay? Is archive_mode=on? What is archive_command? Is the server in the recovery mode? -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
>> Can pre-allocation go that further? for example, assuming >> 0001000E0080 is currently being used, then is it possible >> that a segment named/numbered 00010010007E (which does >> exist in his pg_xlog as he reported in pgsql-admin thread) is >> pre-allocated already? > > Yes, if it's so old that it's no longer required for the crash recovery. > > WAL recycling is performed by checkpoint. Checkpoint always checks > whether there are > WAL files no longer required for crash recovery, IOW, WAL files which > were generated > before the prior checkpoint happened, and then if they are found, > checkpoint tries to recycle > them. > Okay, now I understand. Also, looking at his "ls -l pg_xlog", I could find that modified timestamps of all those pre-allocated segments are about similar (around 12:10), whereas the latest modified time (15:37) is of segment 0001000E00A7. Wonder if whatever configuration he is using is sub-optimal that these many WAL segments can be re-cycled upon a checkpoint? Or is this okay? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
On Thu, May 23, 2013 at 5:01 PM, Amit Langote wrote: >> I think these are the WAL files that were preallocated by WAL >> recycling but have not >> been used yet. >> >>> # WAL after wal_level changed from 'minimal' to 'hot_standby' >>> >>> -rw--- 1 postgres postgres 16777216 May 21 12:27 >>> 0001000E007B >>> -rw--- 1 postgres postgres 16777216 May 21 12:35 >>> 0001000E007C >>> -rw--- 1 postgres postgres 276 May 21 12:35 >>> 0001000E007C.0020.backup >>> -rw--- 1 postgres postgres 16777216 May 21 14:53 >>> 0001000E007D >>> -rw--- 1 postgres postgres 16777216 May 21 14:53 >>> 0001000E007E >>> -rw--- 1 postgres postgres 16777216 May 21 14:53 >>> 0001000E007F >>> -rw--- 1 postgres postgres 16777216 May 21 14:53 >>> 0001000E0080 >> >> These are the WAL files that you now used. So I don't think that WAL >> file sequence rewound >> in this case. >> > > Can pre-allocation go that further? for example, assuming > 0001000E0080 is currently being used, then is it possible > that a segment named/numbered 00010010007E (which does > exist in his pg_xlog as he reported in pgsql-admin thread) is > pre-allocated already? Yes, if it's so old that it's no longer required for the crash recovery. WAL recycling is performed by checkpoint. Checkpoint always checks whether there are WAL files no longer required for crash recovery, IOW, WAL files which were generated before the prior checkpoint happened, and then if they are found, checkpoint tries to recycle them. > I think we could ask the user the latest value of "select > pg_xlogfile_name(pg_xlog_current_location())". Yep. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
> I think these are the WAL files that were preallocated by WAL > recycling but have not > been used yet. > >> # WAL after wal_level changed from 'minimal' to 'hot_standby' >> >> -rw--- 1 postgres postgres 16777216 May 21 12:27 0001000E007B >> -rw--- 1 postgres postgres 16777216 May 21 12:35 0001000E007C >> -rw--- 1 postgres postgres 276 May 21 12:35 >> 0001000E007C.0020.backup >> -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007D >> -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007E >> -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007F >> -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E0080 > > These are the WAL files that you now used. So I don't think that WAL > file sequence rewound > in this case. > Can pre-allocation go that further? for example, assuming 0001000E0080 is currently being used, then is it possible that a segment named/numbered 00010010007E (which does exist in his pg_xlog as he reported in pgsql-admin thread) is pre-allocated already? I think we could ask the user the latest value of "select pg_xlogfile_name(pg_xlog_current_location())". -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL segments (names) not in a sequence
On Thu, May 23, 2013 at 10:10 AM, Amit Langote wrote: > A PostgreSQL user recently reported on pgsql-admin about an issue: > when he changed wal_level from 'minimal' to 'hot_standby', the WAL > segment sequence rewound, that is, it started using old names. A > snippet of his "ls -lrt pg_xlog": > > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00B3 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001001A > -rw--- 1 postgres postgres 16777216 May 21 12:13 000100100036 > -rw--- 1 postgres postgres 16777216 May 21 12:13 00010018 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00F6 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00E4 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00F2 > -rw--- 1 postgres postgres 16777216 May 21 12:13 00010013 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00D5 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00A9 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00D6 > -rw--- 1 postgres postgres 16777216 May 21 12:13 00010010004E > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00D8 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00B4 > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00BA > -rw--- 1 postgres postgres 16777216 May 21 12:13 0001000F00C3 > -rw--- 1 postgres postgres 16777216 May 21 12:13 000100100071 > -rw--- 1 postgres postgres 16777216 May 21 12:13 000100100047 > -rw--- 1 postgres postgres 16777216 May 21 12:13 000100100026 I think these are the WAL files that were preallocated by WAL recycling but have not been used yet. > # WAL after wal_level changed from 'minimal' to 'hot_standby' > > -rw--- 1 postgres postgres 16777216 May 21 12:27 0001000E007B > -rw--- 1 postgres postgres 16777216 May 21 12:35 0001000E007C > -rw--- 1 postgres postgres 276 May 21 12:35 > 0001000E007C.0020.backup > -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007D > -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007E > -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E007F > -rw--- 1 postgres postgres 16777216 May 21 14:53 0001000E0080 These are the WAL files that you now used. So I don't think that WAL file sequence rewound in this case. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers