[GENERAL] pg_ctl kill QUIT
Hello Community, We have used "pg_ctl kill QUIT " to terminate one of the processes on the production database and the database went into recovery mode. We understand that we should not use "kill -9" and we did not do that. Could you please help us avoid this problem permanently. Regards, Venkat -- DISCLAIMER: Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.
Re: [GENERAL] : Postgresql Error after recovery
I have restarted the cluster with "ignore_system_indexes=true" and was able to connect to databases. I have started re-indexing, seems to be working fine. Will get back if i find further issues. Regards, Venkat On Wed, Jul 4, 2012 at 3:35 PM, Raghavendra < raghavendra@enterprisedb.com> wrote: > Re-index option did not work. >> > >> Single user mode option did not work as well - >> >> [postgres@localhost bin]$ postgres --single oltp_db -E -D >> /usr/local/postgresql-9.0.1/data >> 2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL: index >> "pg_attribute_relid_attnum_index" contains unexpected zero page at block 0 >> 2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT: Please REINDEX it. >> >> Last option would be dump and restore. We need to avoid that reduce the >> downtime. >> >> > How about starting the instance with ignore_system_indexes=true ? And do > dump & restore. Am not sure on this procedure. Should someone here would > have better idea on this. > > > http://www.postgresql.org/docs/9.0/interactive/runtime-config-developer.html > -- DISCLAIMER: Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.
Re: [GENERAL] : Postgresql Error after recovery
On Wed, Jul 4, 2012 at 2:12 PM, Raghavendra < raghavendra@enterprisedb.com> wrote: > On Wed, Jul 4, 2012 at 2:11 PM, Raghavendra < > raghavendra@enterprisedb.com> wrote: > >> \ >>> postgres=# \c oltp_db >>> FATAL: index "pg_attribute_relid_attnum_index" contains unexpected >>> zero page at block 0 >>> HINT: Please REINDEX it. >>> Previous connection kept >>> >>> Can anyone please help us. >>> >>> >> You have HINT given to REINDEX it. Use reindexdb from OS. >> > Re-index option did not work. Single user mode option did not work as well - [postgres@localhost bin]$ postgres --single oltp_db -E -D /usr/local/postgresql-9.0.1/data 2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL: index "pg_attribute_relid_attnum_index" contains unexpected zero page at block 0 2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT: Please REINDEX it. Last option would be dump and restore. We need to avoid that reduce the downtime. Regards, Venkat -- DISCLAIMER: Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.
[GENERAL] : Postgresql Error after recovery
Hello Community, We have a critical situation where-in our production database server got effected by "Root Kit". When tried to build a replication site by copying the data directory to a different server, so many files got missed while copying (this is due to root kit effect). So, we moved the missing files individually one-by-one and the situation was a bit better. We are struck with the following issue - We are unable to connect to the databases, when we try to do so, we are getting the below error - postgres=# \c oltp_db FATAL: index "pg_attribute_relid_attnum_index" contains unexpected zero page at block 0 HINT: Please REINDEX it. Previous connection kept Can anyone please help us. Regards, VB -- DISCLAIMER: Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.
Re: [GENERAL] Measuring replication lag time
On Wed, Feb 22, 2012 at 5:40 PM, Stuart Bishop wrote: Hi. > > I need to measure how far in the past a hot standby is, async > streaming replication. > > On the Hot Standby, "select > age(current_timestamp,pg_last_xact_replay_timestamp())" gets me this > (or close enough for my purposes - I understand that if there are no > updates, there are no logs to replay and the lag time will increase). > > Is there some way to get this same information on the master? > pg_stat_replication contains the log information, but I can't see how > to map this to a timestamp. > > Is there a better way of measuring this? > Comparing "pg_controldata" output on prod and standby might help you with this. Thanks, VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
On Thu, Feb 16, 2012 at 8:14 PM, Adrian Klaver wrote: > On Wednesday, February 15, 2012 10:21:02 pm Venkat Balaji wrote: > > Andrian, > > > > Thanks a lot ! > > > > So in this case you are not waiting for confirmation of the commit being > > > > > flushed > > > to disk on the standby. It that case you are bypassing the primary > > > reason for > > > sync replication. The plus is transactions on the master will complete > > > faster > > > and do so in the absence of the standby. The minus is that you are in > > > sort of an > > > in between state. > > > > I understand. My worry and requirement is to ensure master is not > disturbed > > for any reason. > > In sync rep, the biggest worry is if standby server is unavailable and is > > down for longer time, master hangs and will be in the same state until > > standby comes back up or replication must be broken temporarily (until > > standby comes back up) so that master runs without interruption. This is > a > > costly exercise on production from downtime perspective. > > So just use regular streaming replication without sync rep. You get record > based > transaction shipping without having to wait for the standby. You will > need to > make sure that wal_keep_segments is big enough to cover any down time on > the > standby(you would need that for sync rep also). > > As we already have streaming replication configured. We have rolled back the plan of setting up synchronous replication. Thanks, VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
Andrian, Thanks a lot ! So in this case you are not waiting for confirmation of the commit being > flushed > to disk on the standby. It that case you are bypassing the primary reason > for > sync replication. The plus is transactions on the master will complete > faster > and do so in the absence of the standby. The minus is that you are in sort > of an > in between state. > I understand. My worry and requirement is to ensure master is not disturbed for any reason. In sync rep, the biggest worry is if standby server is unavailable and is down for longer time, master hangs and will be in the same state until standby comes back up or replication must be broken temporarily (until standby comes back up) so that master runs without interruption. This is a costly exercise on production from downtime perspective. Personally, I take sync replication to be basically an all or nothing > proposition. By setting it up you are saying you want, at minimum, two > database > clusters to be in sync at any point in time all the time (except for start > up). > If that is not possible then you are really looking for async replication. > Yeah. We will need to make a decision accordingly. Thanks again, VB
Re: [GENERAL] High checkpoint_segments
On Wed, Feb 15, 2012 at 4:12 PM, Andres Freund wrote: > On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote: > > On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe > wrote: > > > On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji < > venkat.bal...@verse.in> > > > > all of these 1000 files get filled up in less than 5 mins, there are > > > > chances that system will slow down due to high IO and CPU. > > > As far as I know there is no data loss issue with a lot of checkpoint > > > segments. > > Data loss would be an issue when there is a server crash or pg_xlog crash > > etc. That many number of pg_xlog files (1000) would contribute to huge > data > > loss (data changes not synced to the base are not guaranteed). Of-course, > > this is not related to the current situation. Normally we calculate the > > checkpoint completion time, IO pressure, CPU load and the threat to the > > data loss when we configure checkpoint_segments. > I think you might be misunderstanding something. A high number of > checkpoint_segments can lead to slower recovery - all those changes need > to be > reapplied - but it won't lead to lost data. The data inside the wal will be > fsynced at appropriate times (commit; background writer; too much written). > Recovery would take time because all the changes in WAL files of pg_xlog (which is high) must be replayed to reach consistent state. When disaster strikes and if pg_xlogs are not available and data in WAL is not fsynced yet, then recovery is not possible and data loss will be huge. It also depends on how much data is not fsynced. Thanks, VB
Re: [GENERAL] High checkpoint_segments
> > Data loss would be an issue when there is a server crash or pg_xlog crash > > etc. That many number of pg_xlog files (1000) would contribute to huge > > data > > loss (data changes not synced to the base are not guaranteed). Of-course, > > this is not related to the current situation. Normally we calculate the > > checkpoint completion time, IO pressure, CPU load and the threat to the > > data loss when we configure checkpoint_segments. > > So you're saying that by using small number of checkpoint segments you > limit the data loss when the WAL gets corrupted/lost? That's a bit like > buying a Maseratti and then not going faster than 10mph because you might > crash at higher speeds ... > No. I am not saying that checkpoint_segments must be lower. I was just trying to explain the IO over-head on putting high (as high as 1000) checkpoint segments. Lower number of checkpoint segments will lead to more frequent IOs which is not good. Agreed. > > The problem here is that the WAL is usually placed on more reliable drives > (compared to the data files) or a RAID1 array and as it's just writing > data sequentially, so the usage pattern is much less likely to cause > data/drive corruption (compared to data files that need to handle a lot of > random I/O, etc.). > Agreed. > So while it possible the WAL might get corrupted, the probability of data > file corruption is much higher. And the corruption might easily happen > silently during a checkpoint, so there won't be any WAL segments no matter > how many of them you keep ... > Agreed. When corruption occurs, it really does not matter how many WAL segments are kept in pg_xlog. But, at any point of time if PG needs > And by using low number of checkpoint segments it actually gets worse, > because it means more frequent checkpoints -> more I/O on the drives -> > more wearout of the drives etc. > Completely agreed. As mentioned above. I choose checkpoint_segments and checkpoint_timeout once i observe the checkpoint behavior. If you need to protect yourself against this, you need to keep a WAL > archive (prefferably on a separate machine) and/or a hot standby for > failover. > WAL archiving is a different situation where-in you need to backup the pg_xlog files by enabling archiving. I was referring to an exclusive situation, where-in pg_xlogs are not archived and data is not yet been synced to base files (by bgwriter) and the system crashed, then PG would depend on pg_xlog to recover and reach the consistent state, if the pg_xlog is also not available, then there would be a data loss and this depends on how much data is present in pg_xlog files. Thanks, VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
In-short, I would like to understand if i am achieving the same asynchronous streaming replication by putting synchronous_commit='local' - I understand that streaming replication is record based log-shipping. Below is what shows up on our primary test server where we are testing synchronous replication - *1. Synchronous setup enabled with synchronous_commit='local'* postgres=# select * from pg_stat_replication ; procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port |backend_start | state | sent_location | write_locat ion | flush_location | replay_location | sync_priority | sync_state -+--+--+--+-+-+-+--+---+---+ ++-+---+ 24099 | 10 | postgres | walreceiver || | 56432 | 2012-02-15 12:55:39.65663+03 | streaming | 0/E78 | 0/E78 | 0/E78 | 0/E78 | 1 | *sync* (1 row) postgres=# show synchronous_commit ; synchronous_commit * local* (1 row) postgres=# show synchronous_standby_names ; synchronous_standby_names - * (1 row) Does this mean that the system is still replicating synchronously ? If yes, by what means ? *Below is our actual production setup in 9.1.1 with asynchronous replication setup -* *2. Asynchronous enabled with synchronous_commit='on'* psql (9.1.1) Type "help" for help. postgres=# select * from pg_stat_replication; procpid | usesysid | usename | application_name | client_addr | client_hostname| client_port | backend_start | state | sent_location | write _location | flush_location | replay_location | sync_priority | sync_state -+--+--+--+-+--+-+---+---+---+-- --++-+---+ 3159 | 10 | postgres | walreceiver | | | 40165 | 2012-02-08 12:41:51.858897+03 | streaming | 1/86F83B50| 1/86F 83B50 | 1/86F83B50 | 1/86F83B50 | 0 | *async* (1 row) postgres=# show synchronous_commit ; synchronous_commit on (1 row) postgres=# show synchronous_standby_names ; synchronous_standby_names --- (1 row) Operation wise, I am not seeing much difference by inserting few 1000 rows. Its almost the same behavior both in asynch and sync rep. Thanks, VB On Wed, Feb 15, 2012 at 11:11 AM, Venkat Balaji wrote: > On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji wrote: > >> >> On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver >> wrote: >> >>> On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote: >>> > Hello, >>> > >>> > Disaster Recovery testing for Synchronous replication setup - >>> > >>> > When the standby site is down, transactions at the production site >>> started >>> > hanging (this is after the successful setup of synchronous >>> replication). >>> > >>> > We changed synchronous_commit to 'local' to over-come this situation. >>> > >>> > - No transactions are hanging at the production site even when the >>> standby >>> > is down >>> > - Standby is automatically getting synced when it is back up again. >>> > >>> > Can someone let us know if there are any "-ve" effects of putting >>> > synchronous_commit='local' ?? >>> > >>> > I am assuming that this as good as putting "synchronous_commit=on" on >>> an >>> > stand-alone system. >>> >>> It would seem you are really after streaming replication(which is >>> asynchronous) >>> more than synchronous replication. I have not used synchronous >>> replication >>> enough to be sure, but I think by setting synchronous_commit='local' >>> you are >>> basically turning the system into a straight streaming(asynchronous) >>> system >>> anyway. >>> >> >> Sorry. Ignore my earlier message - > > Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based > ). All i wanted to achieve is as follows - > > 1. Synchronous replication - which would perform transactions > simultaneously on production and standby. > 2. Ideally, if the commit does not occur at the standby site, then it > would not commit at the production as well, which w
Re: [GENERAL] High checkpoint_segments
On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe wrote: > On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji > wrote: > > > > On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt > wrote: > >> > >> We need to do a few bulk updates as Rails migrations. We're a typical > >> read-mostly web site, so at the moment, our checkpoint settings and WAL > are > >> all default (3 segments, 5 min, 16MB), and updating a million rows > takes 10 > >> minutes due to all the checkpointing. > >> > >> We have no replication or hot standbys. As a consumer-web startup, with > >> no SLA, and not a huge database, and if we ever do have to recover from > >> downtime it's ok if it takes longer.. is there a reason NOT to always > run > >> with something like checkpoint_segments = 1000, as long as I leave the > >> timeout at 5m? > > > > > > Still checkpoints keep occurring every 5 mins. Anyways > > checkpoint_segments=1000 is huge, this implies you are talking about > > 16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O > > perspective and data loss perspective. Even in the most unimaginable > case if > > all of these 1000 files get filled up in less than 5 mins, there are > chances > > that system will slow down due to high IO and CPU. > > > As far as I know there is no data loss issue with a lot of checkpoint > segments. > Data loss would be an issue when there is a server crash or pg_xlog crash etc. That many number of pg_xlog files (1000) would contribute to huge data loss (data changes not synced to the base are not guaranteed). Of-course, this is not related to the current situation. Normally we calculate the checkpoint completion time, IO pressure, CPU load and the threat to the data loss when we configure checkpoint_segments.
Re: [GENERAL] High checkpoint_segments
On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt wrote: > We need to do a few bulk updates as Rails migrations. We're a typical > read-mostly web site, so at the moment, our checkpoint settings and WAL are > all default (3 segments, 5 min, 16MB), and updating a million rows takes 10 > minutes due to all the checkpointing. > > We have no replication or hot standbys. As a consumer-web startup, with > no SLA, and not a huge database, and if we ever do have to recover from > downtime it's ok if it takes longer.. is there a reason NOT to always run > with something like checkpoint_segments = 1000, as long as I leave the > timeout at 5m? > Still checkpoints keep occurring every 5 mins. Anyways checkpoint_segments=1000 is huge, this implies you are talking about 16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O perspective and data loss perspective. Even in the most unimaginable case if all of these 1000 files get filled up in less than 5 mins, there are chances that system will slow down due to high IO and CPU. You may think of increasing checkpoint_timeout as well, but, some monitoring and analysis is needed to arrive at a number. What does pg_stat_bgwriter say about checkpoints ? Do you have log_checkpoints enabled ? Thanks VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji wrote: > > On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver wrote: > >> On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote: >> > Hello, >> > >> > Disaster Recovery testing for Synchronous replication setup - >> > >> > When the standby site is down, transactions at the production site >> started >> > hanging (this is after the successful setup of synchronous replication). >> > >> > We changed synchronous_commit to 'local' to over-come this situation. >> > >> > - No transactions are hanging at the production site even when the >> standby >> > is down >> > - Standby is automatically getting synced when it is back up again. >> > >> > Can someone let us know if there are any "-ve" effects of putting >> > synchronous_commit='local' ?? >> > >> > I am assuming that this as good as putting "synchronous_commit=on" on an >> > stand-alone system. >> >> It would seem you are really after streaming replication(which is >> asynchronous) >> more than synchronous replication. I have not used synchronous replication >> enough to be sure, but I think by setting synchronous_commit='local' you >> are >> basically turning the system into a straight streaming(asynchronous) >> system >> anyway. >> > > Sorry. Ignore my earlier message - Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based ). All i wanted to achieve is as follows - 1. Synchronous replication - which would perform transactions simultaneously on production and standby. 2. Ideally, if the commit does not occur at the standby site, then it would not commit at the production as well, which will cause production site to hang. I do not want production site to hang if the standby site is down or not accessible. 3. I would need the commit to occur on production and the production apps should not be disturbed if the standby fails to respond. To achieve this, I have set synchronous_commit='local' to ensure that transactions are committed at production site first. We do have streaming replication (of PG-9.0) setup on our other production boxes, which is asynchronous and is WAL based. Thanks VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver wrote: > On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote: > > Hello, > > > > Disaster Recovery testing for Synchronous replication setup - > > > > When the standby site is down, transactions at the production site > started > > hanging (this is after the successful setup of synchronous replication). > > > > We changed synchronous_commit to 'local' to over-come this situation. > > > > - No transactions are hanging at the production site even when the > standby > > is down > > - Standby is automatically getting synced when it is back up again. > > > > Can someone let us know if there are any "-ve" effects of putting > > synchronous_commit='local' ?? > > > > I am assuming that this as good as putting "synchronous_commit=on" on an > > stand-alone system. > > It would seem you are really after streaming replication(which is > asynchronous) > more than synchronous replication. I have not used synchronous replication > enough to be sure, but I think by setting synchronous_commit='local' you > are > basically turning the system into a straight streaming(asynchronous) system > anyway. > Yeah. Its a kind of asynchronous. All i wanted is as follows - 1 > > > > > We need to get this setup live on production shortly. > > > > Thanks > > VB > > -- > Adrian Klaver > adrian.kla...@gmail.com >
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
Hello, Disaster Recovery testing for Synchronous replication setup - When the standby site is down, transactions at the production site started hanging (this is after the successful setup of synchronous replication). We changed synchronous_commit to 'local' to over-come this situation. - No transactions are hanging at the production site even when the standby is down - Standby is automatically getting synced when it is back up again. Can someone let us know if there are any "-ve" effects of putting synchronous_commit='local' ?? I am assuming that this as good as putting "synchronous_commit=on" on an stand-alone system. We need to get this setup live on production shortly. Thanks VB On Fri, Feb 10, 2012 at 4:47 PM, Venkat Balaji wrote: > > This issue stays resolved !!! > > The statements are no more hanging on production now :) > > The suspected problem was - > > Our brand new production server did not have the port 5432 open. > > I had opened the port using "iptables" command and everything started > working. > > synchronous replication is fast and awesome. > > Thanks > VB > > > On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver wrote: > >> On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote: >> >> > >> > Connection is working fine between primary and standby, ping is working >> > fine and wal archive file transfer is working without any issues. >> > >> > I tried CREATE TABLE and CREATE DATABASE, both were hanging. >> > >> > Apart from regular streaming replication settings, I did the following >> on >> > primary to enable synchronous replication - >> > >> > synchronous_standby_names='*' >> > >> > Commands started hanging after that. Is there anything else i need to >> do. >> >> From here: >> >> http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html >> >> " >> synchronous_standby_names (string) >> ... The synchronous standby will be the first standby named in this list >> that is >> both currently connected and streaming data in real-time (as shown by a >> state of >> streaming in the pg_stat_replication view). Other standby servers >> appearing >> later in this list represent potential synchronous standbys >> >> The name of a standby server for this purpose is the application_name >> setting of >> the standby, as set in the primary_conninfo of the standby's walreceiver. >> There >> is no mechanism to enforce uniqueness. In case of duplicates one of the >> matching >> standbys will be chosen to be the synchronous standby, though exactly >> which one >> is indeterminate. The special entry * matches any application_name, >> including >> the default application name of walreceiver. >> >> " >> >> So I would check the pg_stat_replication view to see if Postgres is >> seeing the >> standby as streaming. >> >> >> > >> > Thanks >> > VB >> >> -- >> Adrian Klaver >> adrian.kla...@gmail.com >> > >
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
This issue stays resolved !!! The statements are no more hanging on production now :) The suspected problem was - Our brand new production server did not have the port 5432 open. I had opened the port using "iptables" command and everything started working. synchronous replication is fast and awesome. Thanks VB On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver wrote: > On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote: > > > > > Connection is working fine between primary and standby, ping is working > > fine and wal archive file transfer is working without any issues. > > > > I tried CREATE TABLE and CREATE DATABASE, both were hanging. > > > > Apart from regular streaming replication settings, I did the following on > > primary to enable synchronous replication - > > > > synchronous_standby_names='*' > > > > Commands started hanging after that. Is there anything else i need to do. > > From here: > > http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html > > " > synchronous_standby_names (string) > ... The synchronous standby will be the first standby named in this list > that is > both currently connected and streaming data in real-time (as shown by a > state of > streaming in the pg_stat_replication view). Other standby servers appearing > later in this list represent potential synchronous standbys > > The name of a standby server for this purpose is the application_name > setting of > the standby, as set in the primary_conninfo of the standby's walreceiver. > There > is no mechanism to enforce uniqueness. In case of duplicates one of the > matching > standbys will be chosen to be the synchronous standby, though exactly > which one > is indeterminate. The special entry * matches any application_name, > including > the default application name of walreceiver. > > " > > So I would check the pg_stat_replication view to see if Postgres is seeing > the > standby as streaming. > > > > > > Thanks > > VB > > -- > Adrian Klaver > adrian.kla...@gmail.com >
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
On Thu, Feb 2, 2012 at 8:37 PM, Adrian Klaver wrote: > On Wednesday, February 01, 2012 10:51:44 pm Venkat Balaji wrote: > > Hello, > > > > I was testing the Postgres-9.1.1 synchronous streaming replication on our > > UAT system. > > > > Without synchronous replication, everything was working fine. > > > > But, when i enabled synchronous_replication_names='*', the "create table" > > started hanging for long time. > > Only the CREATE TABLE statement or all statements? > In general terms synchronous replication moves at the speed of the > connection > between the primary and standby or does not occur if the standby can not be > found. So what is the state of the connection between the primary and > standby? > > Connection is working fine between primary and standby, ping is working fine and wal archive file transfer is working without any issues. I tried CREATE TABLE and CREATE DATABASE, both were hanging. Apart from regular streaming replication settings, I did the following on primary to enable synchronous replication - synchronous_standby_names='*' Commands started hanging after that. Is there anything else i need to do. Thanks VB
Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
synchronous_commit is "on" Thanks VB On Thu, Feb 2, 2012 at 12:31 PM, Raghavendra < raghavendra@enterprisedb.com> wrote: > What is the value of synchronous_commit ? > > --- > Regards, > Raghavendra > EnterpriseDB Corporation > Blog: http://raghavt.blogspot.com/ > > > > On Thu, Feb 2, 2012 at 12:21 PM, Venkat Balaji wrote: > >> Hello, >> >> I was testing the Postgres-9.1.1 synchronous streaming replication on our >> UAT system. >> >> Without synchronous replication, everything was working fine. >> >> But, when i enabled synchronous_replication_names='*', the "create table" >> started hanging for long time. >> >> When i pressed "Ctrl+C" i got the following message - >> >> Cancel request sent >> WARNING: canceling wait for synchronous replication due to user request >> DETAIL: The transaction has already committed locally, but might not >> have been replicated to the standby. >> CREATE TABLE >> >> Can someone please help us ? >> >> Thanks >> VB >> > >
[GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue
Hello, I was testing the Postgres-9.1.1 synchronous streaming replication on our UAT system. Without synchronous replication, everything was working fine. But, when i enabled synchronous_replication_names='*', the "create table" started hanging for long time. When i pressed "Ctrl+C" i got the following message - Cancel request sent WARNING: canceling wait for synchronous replication due to user request DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. CREATE TABLE Can someone please help us ? Thanks VB
[GENERAL]: streaming replication on PG-9.1.1
Hello, We have configured "streaming replication" (not synchronous) for our production in PG-9.1.1. Replication is working fine, we can see the transactions getting replicated without any issues. I see the below problem - pg_stat_replication on master shows no rows all the time.. ps -Af | grep sender | grep -v grep -- has never shown sender process on master ps -Af | grep receiver | grep -v grep -- has never shown receiver process on slave Note : We do not have active transactions all the time on master. Please help ! Thanks VB
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
2011/12/2 Oleg Serov > And, i'm an idiot. > > My DB version: > PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) > 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit > > > > 2011/12/2 Oleg Serov > >> Hello, i have a problem. >> >> I've got a production server, working fine. Then i've got strange error: >> > ERROR: right sibling's left-link doesn't match: block 147 links to 407 >> instead of expected 146 in index "order_status_key"' >> And decidet to backup all server. So i shut-down VPS with server and >> backup all data. >> Then, after i booted it - and then - i've got Data loss. >> > This seems to be an Index corruption. Did you try re-indexing ? Index creation might have failed, re-indexing would re-organize the Index tuples. If you are sure about disk corruption, try and "re-create" or "create concurrent Index" on a different disk. > I've lost data, that have been written to DB around 10-100 hours >> (different tables, have different last updated value). >> >> Then i've analyzed log, and found this: >> 7 days ago appears this errors: >> db= LOG: could not rename temporary statistics file >> "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": >> db= WARNING: pgstat wait timeout >> ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 > > This should be a free space issue, do you have enough space in "pg_stat_tmp" disk ? 5 days ago: >> a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >> flushed only to F/526512E0 >> 83238 db= WARNING: could not write block 54 of base/16384/2619 >> 83239 db= CONTEXT: writing block 54 of relation base/16384/2619 >> > And today: >> 18 db= LOG: could not open file "pg_xlog/0001000F0052" >> (log file 15, segment 82): >> 19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >> flushed only to F/52FDF0E0 >> > >> There is any ability to recover fresh data from database? >> > What kind of backups you have available ? Thanks VB
Re: [GENERAL] How to restore the table space tar files created by pg_basebackup?
Do you have Tablespace directories with a softlink to the data directory ? Thanks VB On Wed, Nov 30, 2011 at 7:42 PM, Samba wrote: > Hi all, > > I have taken a base backup of my master server using pg_basebackup command > as below: > pg_basebackup -D /tmp/PostgresBackup/ -Ft -Z 9 -l masterbackup -h > localhost -U replication -w > > The above created 4 tar files, namely: 16394.tar.gz 16395.tar.gz > 16396.tar.gz base.tar.gz > > I do know that my database contains 3 table spaces in addition to > pg_default and pg_global ( I guess, that is why it created those three > numbered tar.gz files, plus one base.tar.gz file ) and my master and > standby servers are identical by all means. > > Now, I'm not sure how can I restore these files on the standby server. I > could restore the base.tar.gz into the data directory on standby and the > streaming replication has started working properly. But I'm not sure what > to do with these additional numbered gz files which contains the same data > that is already contained in the base.tar.gz file. > > Can some one explain me what to do with these files? The documentation for > pg_basebackup does not mention this information, it just says that a > different variant of the command will fail if there are multiple table > spaces. > > Another related query is if we can specify the name of the backup file > instead of leaving it to be base.tar.gz file. > > Thanks and Regards, > Samba > > > > >
[GENERAL] : pg_compresslog (pglesslog)
Hello Everyone, Can someone please help me know if there exists a "pglesslog" version for PG-9.0. I only see beta version (1.4.2) for pg9.0 being released sometime ago. Anyone using "pg_lesslog_1.4.2_pg90_beta.tar" for PG-9.0 production successfully ? Can we use the above said version on production ? Please help ! Thanks VB
Re: [GENERAL] Incremental backup with RSYNC or something?
On Fri, Nov 18, 2011 at 6:08 AM, Phoenix Kiula wrote: > On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji > wrote: > >> Question: what can I do to rsync only the new additions in every table > >> starting 00:00:01 until 23:59:59 for each day? > > > > A table level replication (like Slony) should help here. > > > Slony needs more than one physical server, right? > Not necessarily, you can replicate with-in the same host as well. It all depends on which tables you want to replicate. It does not depend on number of hosts. Thanks VB
Re: [GENERAL] : Postgres installation error on CentOS
Hi Alban, Thanks for the reply ! I was able to resolve this issue, but, not by removing the older version of zlib (i was unable to do so due to dependencies). I did not have older version of zlib-devel installed, I did that and able to install. Regards, VB On Tue, Nov 15, 2011 at 5:56 PM, Alban Hertroys wrote: > On 15 November 2011 12:58, Venkat Balaji wrote: > > Hello, > > We are facing an issue while installing Postgres-9.0.1 on CentOS-5. > > That name always makes me wonder when they're releasing PennyOS or > DollarOS :P > > > Below is the error we are encountering - > > ./configure -- output > > checking for inflate in -lz... no > > configure: error: zlib library not found > ... > > /usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz > > Apparently your installed libz doesn't provide a function that > configure is checking for. Perhaps upgrading it helps, if possible? > > Another possibility is that configure gets pointed to an old version > of zlib as the first result from LD. You could try removing that, but > you probably have dependencies on it from other packages. > > > Zlib rpms are installed and below is the rpm -qa output - > > [root@localhost postgresql-9.0.1]# rpm -qa | grep zlib > > zlib-1.2.3-4.el5 > > jzlib-1.0.7-4jpp.1 > > zlib-devel-1.2.3-4.el5 > > zlib-1.2.3-3 > > As a non-linux user this doesn't mean much to me. > -- > If you can't see the forest for the trees, > Cut the trees and you'll see there is no forest. >
[GENERAL] : Postgres installation error on CentOS
Hello, We are facing an issue while installing Postgres-9.0.1 on CentOS-5. Below is the error we are encountering - ./configure -- output checking for inflate in -lz... no configure: error: zlib library not found If you have zlib already installed, see config.log for details on the failure. It is possible the compiler isn't looking in the proper directory. Use --without-zlib to disable zlib support. Inside the config.log, below is what we see - configure:8204: checking for inflate in -lz configure:8239: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -D_GNU_SOURCEconftest.c -lz -lreadline -ltermcap -lcrypt -ldl -lm >&5 /usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz /usr/bin/ld: cannot find -lz Zlib rpms are installed and below is the rpm -qa output - [root@localhost postgresql-9.0.1]# rpm -qa | grep zlib zlib-1.2.3-4.el5 jzlib-1.0.7-4jpp.1 zlib-devel-1.2.3-4.el5 zlib-1.2.3-3 Thanks VB
Re: [GENERAL] Incremental backup with RSYNC or something?
> > Question: what can I do to rsync only the new additions in every table > starting 00:00:01 until 23:59:59 for each day? > A table level replication (like Slony) should help here. Or A trigger based approach with dblink would be an-other (but, a bit complex) option. Thanks VB
Re: [GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0
This problem has been resolved !! Thanks VB On Fri, Nov 11, 2011 at 9:58 PM, Venkat Balaji wrote: > Hello, > > WAL Archive process in our production is not working. > > [postgres@hostname]$ ps -ef | grep archive > postgres 12077 16015 0 10:19 pts/400:00:00 grep archive > postgres 31126 27607 0 Nov10 ?00:01:18 postgres: archiver process > failed on 00010F7200F0 > > I see WAL files getting accumulated in pg_xlog location and the status in > "archive_status" is shown as ".ready". > > Is there anyway we can only restart archiving process without disturbing > the actual cluster ? > > Actually, we had killed a process using "kill -9 " and the db went into > recovery mode and was back up and running. > > We have no issues with the application as well. > > postgres=# select pg_is_in_recovery(); > > pg_is_in_recovery > --- > f > (1 row) > > Please help to resolve this ! > > Thanks > VB >
[GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0
Hello, WAL Archive process in our production is not working. [postgres@hostname]$ ps -ef | grep archive postgres 12077 16015 0 10:19 pts/400:00:00 grep archive postgres 31126 27607 0 Nov10 ?00:01:18 postgres: archiver process failed on 00010F7200F0 I see WAL files getting accumulated in pg_xlog location and the status in "archive_status" is shown as ".ready". Is there anyway we can only restart archiving process without disturbing the actual cluster ? Actually, we had killed a process using "kill -9 " and the db went into recovery mode and was back up and running. We have no issues with the application as well. postgres=# select pg_is_in_recovery(); pg_is_in_recovery --- f (1 row) Please help to resolve this ! Thanks VB
Re: [GENERAL] : failed: ERROR: could not open file "base/44620/972355": No such file or directory
Sorry forgot to mention the thread I referred to - http://archives.postgresql.org/pgsql-general/2010-12/msg01000.php Thanks VB On Thu, Nov 3, 2011 at 3:48 PM, Venkat Balaji wrote: > Hello Everyone, > > We had recently taken an online backup of our production database cluster > (pg_start_backup() - rsync - pg_stop_backup()). > > We had built the testing cluster with the backup. > > When we try to vacuum the database or vacuum full the testing database, > we are getting the following error. > > vacuuming of database failed: ERROR: could not open file > "base/44620/972355": No such file or directory > > In an other situation- > > postgres=# select * from table1 limit 10; > ERROR: could not open file "base/44620/1022275": No such file or directory > > After going through the below thread, one situation which we believe could > be the problem is - > > - Table1 is a partitioned table and we keep dropping and adding partitions > on daily basis. This might have happened at the time we were backing up. > > Is this the problem ? If yes, what precautions we need to take ? perform > the backup when there are no DDLs are happening ? > > Both the base files "972355" and "1022275" are not existing in production > (probably they belong to dropped partitions -- not sure). > > We have a data center migration coming up, so, this is very critical for > us - > > PG Version - 9.0.1 > OS - RHEL 5.5 (prod) and RHEL 6.1 (test) > > Please help > > Thanks > VB >
[GENERAL] : failed: ERROR: could not open file "base/44620/972355": No such file or directory
Hello Everyone, We had recently taken an online backup of our production database cluster (pg_start_backup() - rsync - pg_stop_backup()). We had built the testing cluster with the backup. When we try to vacuum the database or vacuum full the testing database, we are getting the following error. vacuuming of database failed: ERROR: could not open file "base/44620/972355": No such file or directory In an other situation- postgres=# select * from table1 limit 10; ERROR: could not open file "base/44620/1022275": No such file or directory After going through the below thread, one situation which we believe could be the problem is - - Table1 is a partitioned table and we keep dropping and adding partitions on daily basis. This might have happened at the time we were backing up. Is this the problem ? If yes, what precautions we need to take ? perform the backup when there are no DDLs are happening ? Both the base files "972355" and "1022275" are not existing in production (probably they belong to dropped partitions -- not sure). We have a data center migration coming up, so, this is very critical for us - PG Version - 9.0.1 OS - RHEL 5.5 (prod) and RHEL 6.1 (test) Please help Thanks VB
Re: [GENERAL] Server move using rsync
> > We're not doing this long-term, in order to have a backup server we can > fail-over to, but rather as a one-off low impact move of our database. > Consequently, instead of using pg_start_backup and pg_stop_backup, and > keeping all WAL, we're stopping the database, rsync of everything, and > starting the database in the new server, with it appearing to the new > server (if it was capable of noticing such things) that it had simply been > shutdown and restarted. > This is fine. If the database is shutdown, then the backup is completely safe. You can bring up the cluster as on backup time without any issues. > The initial and repeated rsyncs while the first server is running and in > use, are solely in order to reduce the time that the rsync takes while the > postgresql application is stopped. > > Do you still think we need to do anything special with pg_start_backup, > pg_stop_backup, and WAL archives? > Yes, after the initial sync, if the next repeated rsyncs are performed while the database cluster is up and running, then "pg_start_backup()-rsync-pg_stop_backup()" (as said earlier) must be performed. This will help Postgres know that the backup is going on. When you do pg_start_backup(), Postgres will make note and updates all the base file headers and makes a note of the TXN ids and Checkpoint time by creating a label. So, the WAL archives at time are needed for recovery (to recover any half written transactions). Without doing pg_start_backup, and with rsync not performing a "snapshot" > backup, my assumption is that until we do an rsync with the service > shutdown, whatever we've got at the location we're copying to, is not > self-consistent. > Above explanation should answer this. > If we start up postgresql on it, won't it think it is recovering from a > sudden crash? I think it may either appear to recover ok, or complain about > various things, and not start up ok, with neither option providing us with > much insight, as all that could tell us is that either some disk blocks are > consistent, or some are not, which is our starting assumption anyway. > Starting up postgresql would probably result in more disk block changes > that will result in more work next time we rsync. > This is normal behavior of rsync. It all depends on how volatile is your system and volume of changes performed. > How badly can we screw things up, given we intend to perform a final rsync > with no postgresql services running? What should we try and avoid doing, > and why? > > We might simply compare some hashes between the two systems, of some files > that haven't had their last-modified dates changed since the last rsync. > All this will be taken care by Postgres with the help of WAL archive files generated at the time when you performed rsync with postgres services up and running. Thanks VB
Re: [GENERAL] Server move using rsync
> > "Another option is to use rsync to perform a file system backup. This is > done by first running rsync while the database server is running, then > shutting down the database server just long enough to do a second rsync. The > second rsync will be much quicker than the first, because it has relatively > little data to transfer, and the end result will be consistent because the > server was down. This method allows a file system backup to be performed > with minimal downtime." > > Except that we plan on an initial rsync which we think might take a couple > of days, then subsequent daily rsyncs for up to a week to keep it up to date > till we stop the old database, rsync again, and start the new database. > We are performing backups to our production server exactly the same way. We have been through some problems while restoring and bringing up the database. If you are planning to take initial complete rsync with subsequent "incremental rsyncs", then you need to make sure that you have all the WAL archives starting from the initial rsync on Day 1. Also are you doing the following? 1. pg_start_backup() - rsync - pg_stop_backup() ? 2. Please let us know your WAL Archive backup strategy. Is there any way during that week, that we can verify whether our partially > completed database move process is going to result in a database that starts > up ok? > In general, yes, database can start up normally. Without WAL Archives, recovering to a particular time would not be possible. Thanks VB
Re: [GENERAL] Are pg_xlog/* fiels necessary for PITR?
On Thu, Oct 27, 2011 at 7:57 PM, rihad wrote: > Hi, I'm backing up the entire server directory from time to time. pg_xlog/ > directory containing WAL files is pretty heavy (wal_level=archive). Can I > exclude it from the regular tar archive? > The best would be to perform "pg_switch_xlog()" and take a backup excluding pg_xlog. To recover the last moment TXNs, you might need pg_xlog (depends on when you would be recovering). pg_switch_xlog() will reduce the dependency on pg_xlog files to a greater extent. > > #!/bin/sh > > renice 20 $$ 2>/dev/null > pgsql -U pgsql -q -c "CHECKPOINT" postgres # speed up pg_start_backup() > pg_start_backup() performs a checkpoint and ensures that all the data till that particular checkpoint and TXN id will be backed up (or marked as needed for data consistency while restoring and recovering). > pgsql -U pgsql -q -c "select pg_start_backup('sol')" postgres > tar -cjf - /db 2>/dev/null | ssh -q -i ~pgsql/.ssh/id_rsa -p 2022 -c > blowfish dbarchive@10.0.0.1 'cat > db.tbz' > pgsql -U pgsql -q -c "select pg_stop_backup()" postgres > sleep 60 #wait for new WAL backups to appear > echo 'ssh -q dbarchive@10.0.0.1 ./post-backup.sh' | su -m pgsql > > > I want to change tar invocation to be: tar -cjf --exclude 'db/pg_xlog/*' > ... > > Will there be enough data in case of recovery? (May God forbid... ))) > But, all the WAL Archives between backup start time and end time must be backed up. They are needed at any cost for the database to be consistent and the recovery to be smooth. Recovering to any point-in-time purely depends on your backup strategy. Thanks VB
Re: [GENERAL] List Permissions
My answers are in line in RED - How can I list a users permissions table by table? > > i.e. User Joe > has read/write on table1 > has read on table2 > no access on table 3 > For a particular user you can use below function. You can write a SQL query or script which takes table names from "pg_tables" one by one. has_table_privilege(user, table, privilege) Example : I am checking if user "postgres" has "select" privilege on "table1". postgres=# select has_table_privilege('postgres','public.table1','select'); has_table_privilege - t (1 row) For current user (user you logged in as) you can use the following function has_table_privilege(table, privilege) I am checking if the current_user has "select" privilege on "table1" Example: postgres=# select current_user; current_user -- postgres (1 row) postgres=# select has_table_privilege('public.table1','select'); has_table_privilege - t Below link has all the other functions regarding checking permissions http://www.postgresql.org/docs/9.0/static/functions-info.html Hope this helps ! Thanks VB
Re: [GENERAL] : PostgreSQL Online Backup
Sorry for not responding to this email for so long. Alan, We had mentioned the following line in recovery.conf file (we had given pg_xlog location since we did not have WAL archives) - restore_command = 'cp /pg_xlog/%f %p' We found where the problem was - Here is what i did - 1. We had taken a full backup using pg_start_backup() and pg_stop_backup() on Day 1 2. Rest of the days (from Day 2 - Day 15), we had incrementally backed-up ( this is also using pg_start_backup() and pg_stop_backup()) 3. On Day-16th, when i started the recovery, PG was asking Day 1's WAL archive file, which we did not have. A fresh complete backup with change in our backup strategy resolved the issue. Thanks a lot for all your inputs and help on this !! Regards, VB 2011/10/4 Alan Hodgson > > > rsync works fine. Why exactly can't the recovery find the backed up > copy > > > of 000105390076? Please post your archive_command settings, > > > the contents of any script(s) called by that, and the recovery.conf > file > > > you're using that's having problems, as well as the complete process > you > > > followed to > > > initiate recovery. I strongly suspect you're missing part of the > process > > > of actually saving the WAL files needed for recovery. > > > The recovery is unable to find the WAL archive because, it was generated > on > > 26th September. > > > > Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the > files. > > > > I do not have that WAL archive copy. > > > > The problem area - > > > > I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not > > 256K). > > > > I'm going to need the rest of what I asked for to offer any further > suggestions > - especially the full and exact steps you took to initiate recovery and the > contents of recovery.conf. Also, please don't top-post. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
The recovery is unable to find the WAL archive because, it was generated on 26th September. Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files. I do not have that WAL archive copy. The problem area - I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not 256K). Thanks VB 2011/10/3 Alan Hodgson > On October 3, 2011 05:33:35 AM Venkat Balaji wrote: > > Did anyone observe this behavior ?? Please help ! > > > > This is critical for us. I want to recommend not to use "rsync" (use cp > or > > scp instead) for production backup. > > > > rsync works fine. Why exactly can't the recovery find the backed up copy of > 000105390076? Please post your archive_command settings, the > contents of any script(s) called by that, and the recovery.conf file you're > using that's having problems, as well as the complete process you followed > to > initiate recovery. I strongly suspect you're missing part of the process of > actually saving the WAL files needed for recovery. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
Another problem in recovery (probably because of "rsync") - As said earlier, we are taking a production backup everyday incrementally using "rsync". But, Postgres some how misses to sync few files in between and keeps on asking the back dated archive files (more than 1 week ago). I restored October 2nd backup and PG is asking for September 26th archive file with the last known time as 26th Sep, 2011. 2011-10-03 07:17:12 CDT [12705]: [1-1] LOG: database system was interrupted; last known up at 2011-09-26 09:01:36 CDT 2011-10-03 07:17:12 CDT [12705]: [2-1] LOG: starting archive recovery cp: cannot stat `/usr/local/pgsql9.0.1/obtdata/data/pg_xlog/000105390076': No such file or directory 2011-10-03 07:17:12 CDT [12705]: [3-1] LOG: could not open file "pg_xlog/000105390076" (log file 1337, segment 118): No such file or directory 2011-10-03 07:17:12 CDT [12705]: [4-1] LOG: invalid checkpoint record 2011-10-03 07:17:12 CDT [12705]: [5-1] PANIC: could not locate required checkpoint record 2011-10-03 07:17:12 CDT [12705]: [6-1] HINT: If you are not restoring from a backup, try removing the file "/usr/local/pgsql9.0.1/obtdata/data/backup_label". 2011-10-03 07:17:12 CDT [12702]: [1-1] LOG: startup process (PID 12705) was terminated by signal 6: Aborted 2011-10-03 07:17:12 CDT [12702]: [2-1] LOG: aborting startup due to startup process failure I always see pg_clog files and some base files not getting synced. Below is what we are doing - pg_start_backup() rsync the data directory pg_stop_backup() The first time "rsync" is fine, but, the subsequent runs are generating in-consistency. We do the same every day to backup the data directory incrementally. What i observed is PG records the TXN id when ever backup starts and stops + backup label. The next day when PG records the start backup time and TXN id, i think some of the TXN ids and pg_clog files generated between last stop time and the next start time are missed. Did anyone observe this behavior ?? Please help ! This is critical for us. I want to recommend not to use "rsync" (use cp or scp instead) for production backup. Thanks VB On Tue, Sep 27, 2011 at 2:36 PM, Albe Laurenz wrote: > Venkat Balaji wrote: > > Our problem is - > > > > We had mistakenly executed "rsync" on the running PostgreSQL data > directory (production) and we did > > not run "pg_start_backup()". > > > > Will this harm production ? can this lead to corruption ? > > I assume that you used rsync to copy *from* the data directory. > > This cannot lead to data corruption. > Only performance might suffer temporarily due to the additional I/O. > > The backup made with rsync will be unusable without pg_start_backup(). > > Yours, > Laurenz Albe >
Re: [GENERAL] could not access file "$libdir/pg_buffercache": No such file or directory
One way could be - If the restore you are performing is not very big. Then pg_dump can be taken with "--inserts" and we can remove "create function " lines from the output file and restore. Thanks VB On Fri, Sep 30, 2011 at 10:59 AM, Venkat Balaji wrote: > I had faced the same problem 2 days earlier and that was for > "pg_freespacemap" contrib module. > > I did not know the way to ignore these functions and installed THE contrib > modules and restored. It worked ! > > I am also looking for a way to ignore these functions. > > Thanks > VB > > > On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn wrote: > >> I'm in the process of testing out Postgres 9.0 for production use. I've >> been using it for development on my mac, a build from EnterpriseDB. We've >> just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a >> backport from lucid. There's an existing 8.4.8 postgres install also on >> that machine, but as far as I'm aware the debs are set up so you can happily >> have two installs side by side. >> >> I've dumped a test DB from my laptop and attempted to restore it on to the >> ubuntu machine, but I see errors: >> >> royce@fishy:~$ createdb test >> royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc >> pg_restore: [archiver (db)] Error while PROCESSING TOC: >> pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION >> pg_buffercache_pages() persona >> pg_restore: [archiver (db)] could not execute query: ERROR: could not >> access file "$libdir/pg_buffercache": No such file or directory >>Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF >> record >>LANGUAGE c >>AS '$libdir/pg_buffercache', 'pg_buffercache_... >> >> >> After looking around a bit, my theory on these errors is that my mac has >> some contrib module installed that adds views to my test database, and those >> views have been included in the dump. The contrib module is apparently not >> installed on the ubuntu machine and hence the restore can't create the >> views. Is this theory correct? If so: >> >> - Can these errors be safely ignored? (not ideal, because I'll miss other, >> real errors) >> - Is there some simple way I can prevent these views from being emitted as >> part of the dump? >> - Is there some simple way I can prevent these views from being restored >> from a dump that contains them? >> - Otherwise, how can I purge these contrib modules from my laptop? >> >> Otherwise, if my theory is incorrect, any hints as to what it might be? >> >> Cheers! >> >> --Royce >> >> Mac: >> >> version >> >> >> PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC >> i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit >> (1 row) >> >> >> Ubuntu box: >> >> version >> >> - >> PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real >> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit >> (1 row) >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > >
Re: [GENERAL] could not access file "$libdir/pg_buffercache": No such file or directory
I had faced the same problem 2 days earlier and that was for "pg_freespacemap" contrib module. I did not know the way to ignore these functions and installed THE contrib modules and restored. It worked ! I am also looking for a way to ignore these functions. Thanks VB On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn wrote: > I'm in the process of testing out Postgres 9.0 for production use. I've > been using it for development on my mac, a build from EnterpriseDB. We've > just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a > backport from lucid. There's an existing 8.4.8 postgres install also on > that machine, but as far as I'm aware the debs are set up so you can happily > have two installs side by side. > > I've dumped a test DB from my laptop and attempted to restore it on to the > ubuntu machine, but I see errors: > > royce@fishy:~$ createdb test > royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION > pg_buffercache_pages() persona > pg_restore: [archiver (db)] could not execute query: ERROR: could not > access file "$libdir/pg_buffercache": No such file or directory >Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF record >LANGUAGE c >AS '$libdir/pg_buffercache', 'pg_buffercache_... > > > After looking around a bit, my theory on these errors is that my mac has > some contrib module installed that adds views to my test database, and those > views have been included in the dump. The contrib module is apparently not > installed on the ubuntu machine and hence the restore can't create the > views. Is this theory correct? If so: > > - Can these errors be safely ignored? (not ideal, because I'll miss other, > real errors) > - Is there some simple way I can prevent these views from being emitted as > part of the dump? > - Is there some simple way I can prevent these views from being restored > from a dump that contains them? > - Otherwise, how can I purge these contrib modules from my laptop? > > Otherwise, if my theory is incorrect, any hints as to what it might be? > > Cheers! > > --Royce > > Mac: > > version > > > PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC > i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit > (1 row) > > > Ubuntu box: > > version > > - > PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real > (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit > (1 row) > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : Looking for a PostgreSQL book
Thanks Adam ! Regards, VB On Thu, Sep 29, 2011 at 12:03 AM, Adam Cornett wrote: > The same publisher (Packt) has a book *PostgreSQL > 9 Administration Cookbook* by Simon Riggs and Hannu Krosing that is > equally useful as Greg's *High Performance* book > > > On Wed, Sep 28, 2011 at 1:14 PM, Venkat Balaji wrote: > >> Hello Everyone, >> >> I have been working on PostgreSQL for quite a while (2 yrs) now. >> >> I have got "PostgreSQL 9.0 High Performance" book and quite excited to go >> through it. >> >> Please let me know any source where i can get more books on PG, I am >> especially looking for books on PG internals, architecture, Backup & >> Recovery and HA. >> >> Looking forward for the information. >> >> Regards, >> VB >> > > > > -- > Adam Cornett > adam.corn...@gmail.com > (678) 296-1150 >
[GENERAL] : Looking for a PostgreSQL book
Hello Everyone, I have been working on PostgreSQL for quite a while (2 yrs) now. I have got "PostgreSQL 9.0 High Performance" book and quite excited to go through it. Please let me know any source where i can get more books on PG, I am especially looking for books on PG internals, architecture, Backup & Recovery and HA. Looking forward for the information. Regards, VB
Re: [GENERAL] PostgreSQL recovery when lost some file in data\global
Hi Tuan Hoang Anh, Are you able to bring up the cluster ?? Please let us know what problem you are facing. Thanks Venkat On Tue, Sep 27, 2011 at 12:08 PM, tuanhoanganh wrote: > I am running PostgreSQL 9.0.1 32bit on windows 2003. Last night my disk had > some problem and i lost some file in data\global. > Is there anyway to recovery postgresql. > > Thanks in advance. Sorry for my English. > > Tuan Hoang Anh >
Re: [GENERAL] : PostgreSQL Online Backup
Thanks for all your inputs ! Our problem is - We had mistakenly executed "rsync" on the running PostgreSQL data directory (production) and we did not run "pg_start_backup()". Will this harm production ? can this lead to corruption ? Thanks - On Mon, Sep 26, 2011 at 10:29 PM, Alan Hodgson wrote: > On September 26, 2011 05:49:50 AM Venkat Balaji wrote: > > I tried restoring the backup, after taking the full backup. > > > > Below is what i see in the "archive destination". > > > > Postgres was asking for "00010193006F" and i tried to find > the > > same and below is what i find... > > > > -rw--- 1 postgres postgres 3.3M Sep 26 02:06 > > 00010193006F.gz -rw--- 1 postgres postgres 219 Sep 26 > > 02:53 > > 00010193006F.00328508.backup.gz > > > > Why is PG (9.0) putting an extension for the WAL Archive file as > > > "backup.gz" ?? > > > > The archive files are created by your archive_command, as specified in > postgresql.conf. My guess would be that your archive command runs the files > through gzip as part of archiving (which is fine). > > However, the restore_command you specify in recovery.conf must undo this > compression. So instead of (for example) 'cp -f "%f" "%p"', it might > instead > need to look like 'zcat "%f" > "%p"'. > > Hope this helps. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
I tried restoring the backup, after taking the full backup. Below is what i see in the "archive destination". Postgres was asking for "00010193006F" and i tried to find the same and below is what i find... -rw--- 1 postgres postgres 3.3M Sep 26 02:06 00010193006F.gz -rw--- 1 postgres postgres 219 Sep 26 02:53 00010193006F.00328508.backup.gz Why is PG (9.0) putting an extension for the WAL Archive file as "backup.gz" ?? Please help ! Thanks VB On Mon, Sep 26, 2011 at 5:11 PM, Venkat Balaji wrote: > Hello Everyone, > > We have had situations where-in "rsync" was executed without executing > "pg_start_backup()" on the production data directory and on the next runs, > "pg_start_backup()" has been executed with "rsync". This was to avoid high > IO load on production. We ended up getting unmatched files (especially in > pg_clog) and not sure about "base" directory. > > Postgres is asking for WAL Archive files dated sometime around 15 days ago. > We are absolutely not sure whats going on. > > Is this dangerous for production (like corruption) ? or just the backup > will be invalid ? Please help us know if we have to perform any > precautionary checks on the production cluster. > > Apart from firing a checkpoint, does "pg_start_backup()" updates any > dictionary tables or views ? or it updates anything in "pg_xlog" > > Looking forward for your help ! > > Thanks > VB > > >
[GENERAL] : PostgreSQL Online Backup
Hello Everyone, We have had situations where-in "rsync" was executed without executing "pg_start_backup()" on the production data directory and on the next runs, "pg_start_backup()" has been executed with "rsync". This was to avoid high IO load on production. We ended up getting unmatched files (especially in pg_clog) and not sure about "base" directory. Postgres is asking for WAL Archive files dated sometime around 15 days ago. We are absolutely not sure whats going on. Is this dangerous for production (like corruption) ? or just the backup will be invalid ? Please help us know if we have to perform any precautionary checks on the production cluster. Apart from firing a checkpoint, does "pg_start_backup()" updates any dictionary tables or views ? or it updates anything in "pg_xlog" Looking forward for your help ! Thanks VB
Re: [GENERAL] : Checksum ERROR when restoring Online Backup
Thanks Richard ! I realized that, I was restoring on an 32 bit server. Regards, Venkat On Fri, Sep 23, 2011 at 6:59 PM, Richard Huxton wrote: > On 23/09/11 13:53, Venkat Balaji wrote: > >> Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT >> 2010 x86_64 x86_64 x86_64 GNU/Linux >> >> The server version where I restored the production backup is as follows - >> Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 >> EDT 2010 i686 i686 i386 GNU/Linux >> >> I read some where that, Postgres datafiles are not architecture >> independent. >> > > They (the WAL files) are not, and it looks like you're trying to restore a > 64-bit version onto a 32-bit server. That's not going to work. A > pg_dump/restore works of course, and if you need replication then Slony can > handle this. > > -- > Richard Huxton > Archonet Ltd >
Re: [GENERAL] : Checksum ERROR when restoring Online Backup
Apologies - I did not mention complete details of PG and OS - Postgres 9.0.1 Production Operating System version where Postgres is running is as follows - Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux The server version where I restored the production backup is as follows - Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT 2010 i686 i686 i386 GNU/Linux I read some where that, Postgres datafiles are not architecture independent. Please help ! Thanks Venkat On Fri, Sep 23, 2011 at 6:11 PM, Venkat Balaji wrote: > Hello Everyone, > > I am testing the Online Backups of our production databases ( this is part > of our disaster recovery plan ). > > After restoring the Online Backup, we tried to bring up the cluster and > ended up with the following error - > > 2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL: incorrect checksum in > control file. > > Does this message mean, the Online Backup is corrupted or invalid ? > > Normally, we get recovery related error messages. This is the first time we > are facing a problem like this. > > Please help as this is critical for us. > > Thanks > Venkat >
[GENERAL] : Checksum ERROR when restoring Online Backup
Hello Everyone, I am testing the Online Backups of our production databases ( this is part of our disaster recovery plan ). After restoring the Online Backup, we tried to bring up the cluster and ended up with the following error - 2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL: incorrect checksum in control file. Does this message mean, the Online Backup is corrupted or invalid ? Normally, we get recovery related error messages. This is the first time we are facing a problem like this. Please help as this is critical for us. Thanks Venkat
[GENERAL] Calculate Vacuum Metrics
Hello Everyone, I am in the process of scheduling a VACUUM FULL for our production databases where in downtime is extremely critical. Can someone please help me calculate the amount of free space (or free pages) in the Table and Index (even after regular autovacuum or vacuum analyze is performed). This will help me calculate the amount of space that will be claimed after the vacuum full is performed. I am using the below query (which i got from Google) to calculate the wasted space and bloats in the Table and Index. SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC ; Can i continue to use it ? or is there any other better way to get the info ? Please help ! Thanks Venkat
Re: [GENERAL] warm standby - apply wal archives
Syncing just WAL archive directory every minute should not be a problem at all (running rsync every minute for a data directory is not recommended). As said earlier, we had configured warm standby for a db of size 2 TB and wal archive generation was in 100s. We did not encounter any issues in running an rsync job with a frequency of even less than a minute. We made sure that rsync job is running on standby server (we were pulling the wal archives to standby). 1. compress the wal archives 2. rsync on standby site 3. uncompress on standby site Thanks Venkat On Sat, Sep 17, 2011 at 6:06 PM, MirrorX wrote: > just another update since the system is up and running and one more > question > :p > > the secondary server is able to restore the wal archives practically > immediately after they arrive. i have set a rsync cron job to send the new > wals every 5 minutes. the procedure to transfer the files and to restore > them takes about 30 seconds (the number of archives is about 20-30). i ve > tried to set it to 2 minutes, and then the procedure takes about 20 seconds > (both transfer and restoration) while i didnt notice any impact on the > primary server (the procedure is initiated on the secondary server). what > is > your opinion about the time interval that the cron job should run? i ve > read many articles online indicating that rsync should not run every 1 > minute, but in my case isn't it different since it just syncs two folder > containing only wals and not the whole disks? plus both folders on the > servers are in different partitions. > thx in advance for your insight > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4813659.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Has Pg 9.1.0 been released today?
Yes. I would be excited to know if there is a possibility of multi-master replication system on Postgres. We will be soon using 9.1 Streaming replication. Thanks Venkat On Tue, Sep 13, 2011 at 1:31 AM, Aleksey Tsalolikhin < atsaloli.t...@gmail.com> wrote: > Congratulations on the release of 9.1.0! > > Lots of great features, I for one can't wait to try out unlogged > tables, that should help a lot in our environment. > > Now that you have streaming replication both async and sync, are you > working on multi-master replication? *excited* Or what's the > roadmap? > > Thanks again and keep up the great work! > > Aleksey > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] warm standby - apply wal archives
Considering the size of WAL archives = 200GB Compressing them using gzip (you can use this command in a shell script and place it in archive_command as well) would possibly reduce the size to as low as 10 - 20 GB. Please let us know the results. Thanks Venkat On Tue, Sep 6, 2011 at 1:03 PM, MirrorX wrote: > The network bandwidth between the servers is definitely not an issue. What > is > bothering me is the big size of the wal archives, which goes up to 200GB > per > day and if the standby server will be able to replay all these files. The > argument that; since the master can do it and also do various other tasks > at > the same time, and since the secondary is identical to the first, so he > should be able to do that seems valid, so i will give it a try and let you > know about the results. In the meantime if there are any other > ideas/suggestions etc please let me know. thx to all > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4773498.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] warm standby - apply wal archives
In my experience, I had configured a warm standby for 2 TB Postgres Cluster (PostgreSQL 8.4). Note : I do not know your database size and WAL archive generation rate. Important considerations i made were as follows - 1. WAL archives transfer from production to standy depends on the network bandwidth (i think you said there is no issue there) and the size of the WAL archives. 2. Transfer rate can be optimized by compressing the WAL files. Each WAL file size would reduce to 2 - 3 MB from 16 MB (only in case of warm standby. In streaming replication size would decrease to 7 or 6 MB), which makes huge difference for the network bandwidth. Compress the WAL archives at the production and transfer & uncompress the WALs on standby. I did this successfully. Hope this helps ! Thanks Venkat On Tue, Sep 6, 2011 at 2:57 AM, MirrorX wrote: > the nodes communicate through 4Gbps ethernet so i dont think there is an > issue there. probably some kind of misconfiguration of DRBD has occured. i > will check on that tommorow. thx a lot :) > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4772126.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Postgresql-9.0.1 Recovery
Thanks Craig ! Below is what i did - 1. pg_start_backup() 2. rsync the data dir 3. pg_stop_backup() I believe the backup is valid because, i was able to bring up the cluster without any issues (ofcourse with data loss). +ve signs- I am able to bring up the cluster with the Online backup, but, only with the loss of data. -ve signs and things to be strongly foreseen while backup testing - - pg_clog files were not synced. I suspect they were being written at the time of backup. I might have tried to sync the data dir when pg_clog files were half filled. - Though the WAL Archives are there, Postgres is not trying to recover beyond the timestamp at which pg_clog was missing. - Even if i replace the missing pg_clog files (which i did), Postgres is asking for the corresponding wal archive files Yes. What i learnt is that we need to ensure that all the pg_clog files must be fully copied as on the backup time. We cannot afford to miss any of them. Thanks Venkat On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer wrote: > On 30/08/2011 6:59 PM, Venkat Balaji wrote: > >> Hello Everyone, >> >> I have a situation here - >> >> I am trying to restore the production online backup and recover the same. >> >> - I had initially rsynced (excluded pg_log) the data directory and the >> tarred and zipped the same >> > > Did you do that after pg_start_backup() or on a stopped database server? > > If you did it on a running database server without first running > pg_start_backup(), your backup is invalid. > > Personally I like to take my base backups from an LVM snapshot of the > datadir just to be extra safe. That isn't necessary, though, and a regular > rsync or tar or whatever of a datadir after pg_start_backup() is fine. > > Remember to run pg_stop_backup() afterwards. > > > - I got an error "unable to read from pg_clog location" >> (file size is around 160K) >> > > ... from PostgreSQL, when you tried to start it? > > What emitted that error message? > > > What i understood is that, rsync some how missed out on syncing the >> files in "pg_clog" so, i had manually coped the missing pg_clog file >> from production and tried recovery. >> > > That won't work. You need a consistent snapshot of all the files in the > data dir. You cannot just mix and match copies taken at different times. > > For efficiency reasons PostgreSQL will recycle used clog files. You can't > just copy a file over and hope that because it has the same name, it still > contains the data you want. > > Your backup *failed* at the point where you got an incomplete copy of the > data directory. > > > Do i need to get that particular wal archive which is before online >> backup time ? >> > > No, you need to get the missing clog files. If you cannot do that, try > using pg_resetxlog, but be aware that that may lose transactions and can > potentially cause corruption of tables and indexes. > > > By this experience what i understand is that Postgresql stores committed >> and uncommited transactions in pg_xlog / wal archive files and >> information (not the transaction data) about transaction commit status >> is stored in pg_clog. Am I correct ? >> > > That sounds right to me, but I don't know as much about how Pg stores > things as I should. > > > I am in the process of designing a disaster recovery planner for our >> productions systems. >> > > Congratulations! > > Be extremely glad this didn't happen in a real recovery scenario. This is a > marvellous example of why you should always test your backups - you actually > did, and found a problem that would've been a critical issue if the backup > were actually needed. > > -- > Craig Ringer >
[GENERAL] Postgresql-9.0.1 Recovery
Hello Everyone, I have a situation here - I am trying to restore the production online backup and recover the same. - I had initially rsynced (excluded pg_log) the data directory and the tarred and zipped the same - SCP'd the tar to a different server and untarred and unzipped the same - I got an error "unable to read from pg_clog location" (file size is around 160K) I have an backup as on Aug 24th and the pg_clog file which pg is unable to read is as on Aug 5th. I am aware that the size of the files in pg_clog is supposed to be 256K. What i understood is that, rsync some how missed out on syncing the files in "pg_clog", so, i had manually coped the missing pg_clog file from production and tried recovery. To my surprise, PG recovery had asked for the corresponding pg_xlog (wal archive) file as on Aug 5th. Is there a way to recovery this ? Do i need to get that particular wal archive which is before online backup time ? I do have all the other files till Aug24th. By this experience what i understand is that Postgresql stores committed and uncommited transactions in pg_xlog / wal archive files and information (not the transaction data) about transaction commit status is stored in pg_clog. Am I correct ? I am in the process of designing a disaster recovery planner for our productions systems. Version - PG-9.0.1 OS- RHEL 5 Please advice ! Regards, Venkat
Re: [GENERAL] heavy swapping, not sure why
It is recommended to identify the processes using up high work_mem and try to set work_mem to higher value at the session level. I this case, all the connections using up maximum work_mem is the potential threat. As said by Zoltan, work_mem is very high and shared_buffers as well. Other considerations would be as following - - Allocated kernel memory settings (like shmmax and shmget etc..) - How much memory is used up by the system level processes (like root and non-pg users) - It also depends on the database size and the amount of data being accessed across CPUs and memory. - We need to ensure if unnecessary data is being read into the memory ( queries hitting non-vacuumed tables, slow performing queries, unnecessary full table scans etc) Regards, Venkat On Tue, Aug 30, 2011 at 3:30 PM, Boszormenyi Zoltan wrote: > Hi, > > 2011-08-29 22:36 keltezéssel, Lonni J Friedman írta: > > ... I read that > > (max_connections * work_mem) should never exceed physical RAM, and if > > that's accurate, then I suspect that's the root of my problem on > > systemA (below). > > work_mem is process-local memory so > > (max_connections * work_mem) < (physical RAM - shared_buffers) > > Some queries may allocate multiples of work_mem, too. > > Also, the kernel uses some memory for internal accounting, caching > and you need to account for the process binary in memory. > > > However, I'd like confirmation before I start > > tweaking things, as one of these servers is in production, and I can't > > easily tweak settings to experiment (plus this problem takes a few > > weeks before swapping gets bad enough to impact performance). > > > > A few examples: > > > > 0) system A: 56GB RAM, running postgresql-8.4.8 with the following > parameters: > > maintenance_work_mem = 96MB > > effective_cache_size = 40GB > > work_mem = 256MB > > wal_buffers = 16MB > > shared_buffers = 13GB > > max_connections = 300 > > RAM (56GB) - shared_buffers (13GB) = 43GB > > which is less than > > work_mem * max_connections = 300 * 0.25GB = 75GB > > The system would start swapping before 43GB/0.25GB = 172 clients. > > > 1) system B: 120GB RAM, running postgresql-9.0.4 with the following > parameters: > > maintenance_work_mem = 1GB > > effective_cache_size = 88GB > > work_mem = 576MB > > wal_buffers = 4MB > > shared_buffers = 28GB > > max_connections = 200 > > Similarly: > > 120GB - 28GB = 92GB > > is less than > > work_mem * max_connections = 200 * 576MB = 112.5GB > > Also, if you run anything else on the machine then the system would start > swapping much sooner than hitting max_connections number of clients. > > I would never set work_mem that high by default. 8 - 16MB is usually > enough for the common case and you can set work_mem for special > queries from the client and then reset it. > > Best regards, > Zoltán Böszörményi > > -- > -- > Zoltán Böszörményi > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > http://www.postgresql.at/ > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >