from:"Venkat Balaji"

[GENERAL] pg_ctl kill QUIT

2012-07-30 Thread Venkat Balaji

Hello Community,

We have used "pg_ctl kill QUIT " to terminate one of the processes on
the production database and the database went into recovery mode.

We understand that we should not use "kill -9" and we did not do that.

Could you please help us avoid this problem permanently.

Regards,
Venkat

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

Re: [GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji

I have restarted the cluster with "ignore_system_indexes=true" and was able
to connect to databases.

I have started re-indexing, seems to be working fine. Will get back if i
find further issues.

Regards,
Venkat

On Wed, Jul 4, 2012 at 3:35 PM, Raghavendra <
raghavendra@enterprisedb.com> wrote:

> Re-index option did not work.
>>
>
>> Single user mode option did not work as well -
>>
>> [postgres@localhost bin]$ postgres --single oltp_db -E -D
>> /usr/local/postgresql-9.0.1/data
>> 2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL:  index
>> "pg_attribute_relid_attnum_index" contains unexpected zero page at block 0
>> 2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT:  Please REINDEX it.
>>
>> Last option would be dump and restore. We need to avoid that reduce the
>> downtime.
>>
>>
> How about starting the instance with ignore_system_indexes=true ? And do
> dump & restore. Am not sure on this procedure. Should someone here would
> have better idea on this.
>
>
> http://www.postgresql.org/docs/9.0/interactive/runtime-config-developer.html
>

-- 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

Re: [GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji

On Wed, Jul 4, 2012 at 2:12 PM, Raghavendra <
raghavendra@enterprisedb.com> wrote:

> On Wed, Jul 4, 2012 at 2:11 PM, Raghavendra <
> raghavendra@enterprisedb.com> wrote:
>
>> \
>>> postgres=# \c oltp_db
>>>  FATAL:  index "pg_attribute_relid_attnum_index" contains unexpected
>>> zero page at block 0
>>> HINT:  Please REINDEX it.
>>> Previous connection kept
>>>
>>> Can anyone please help us.
>>>
>>>
>> You have HINT given to REINDEX it. Use reindexdb from OS.
>>
>

Re-index option did not work.

Single user mode option did not work as well -

[postgres@localhost bin]$ postgres --single oltp_db -E -D
/usr/local/postgresql-9.0.1/data
2012-07-04 04:30:47 CDT [26072]: [1-1] user=,db= FATAL:  index
"pg_attribute_relid_attnum_index" contains unexpected zero page at block 0
2012-07-04 04:30:47 CDT [26072]: [2-1] user=,db= HINT:  Please REINDEX it.

Last option would be dump and restore. We need to avoid that reduce the
downtime.

Regards,
Venkat

-- 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

[GENERAL] : Postgresql Error after recovery

2012-07-04 Thread Venkat Balaji

Hello Community,

We have a critical situation where-in our production database server got
effected by "Root Kit".

When tried to build a replication site by copying the data directory to a
different server, so many files got missed while copying (this is due to
root kit effect).

So, we moved the missing files individually one-by-one and the situation
was a bit better.

We are struck with the following issue -

We are unable to connect to the databases, when we try to do so, we are
getting the below error -

postgres=# \c oltp_db
FATAL:  index "pg_attribute_relid_attnum_index" contains unexpected zero
page at block 0
HINT:  Please REINDEX it.
Previous connection kept

Can anyone please help us.

Regards,
VB

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

Re: [GENERAL] Measuring replication lag time

2012-02-22 Thread Venkat Balaji

On Wed, Feb 22, 2012 at 5:40 PM, Stuart Bishop wrote:

Hi.
>
> I need to measure how far in the past a hot standby is, async
> streaming replication.
>
> On the Hot Standby, "select
> age(current_timestamp,pg_last_xact_replay_timestamp())" gets me this
> (or close enough for my purposes - I understand that if there are no
> updates, there are no logs to replay and the lag time will increase).
>
> Is there some way to get this same information on the master?
> pg_stat_replication contains the log information, but I can't see how
> to map this to a timestamp.
>
> Is there a better way of measuring this?
>

Comparing "pg_controldata" output on prod and standby might help you with
this.

Thanks,
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-19 Thread Venkat Balaji

On Thu, Feb 16, 2012 at 8:14 PM, Adrian Klaver wrote:

> On Wednesday, February 15, 2012 10:21:02 pm Venkat Balaji wrote:
> > Andrian,
> >
> > Thanks a lot !
> >
> > So in this case you are not waiting for confirmation of the commit being
> >
> > > flushed
> > > to disk on the standby.  It that case you are bypassing the primary
> > > reason for
> > > sync replication. The plus is transactions on the master will complete
> > > faster
> > > and do so in the absence of the standby. The minus is that you are in
> > > sort of an
> > > in between state.
> >
> > I understand. My worry and requirement is to ensure master is not
> disturbed
> > for any reason.
> > In sync rep, the biggest worry is if standby server is unavailable and is
> > down for longer time, master hangs and will be in the same state until
> > standby comes back up or replication must be broken temporarily (until
> > standby comes back up) so that master runs without interruption. This is
> a
> > costly exercise on production from downtime perspective.
>
> So just use regular streaming replication without sync rep. You get record
> based
> transaction shipping without having to wait for the standby.  You will
> need to
> make sure that wal_keep_segments is big enough to cover any down time on
> the
> standby(you would need that for sync rep also).
>
>
As we already have streaming replication configured. We have rolled back
the plan of setting up synchronous replication.

Thanks,
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-15 Thread Venkat Balaji

Andrian,

Thanks a lot !

So in this case you are not waiting for confirmation of the commit being
> flushed
> to disk on the standby.  It that case you are bypassing the primary reason
> for
> sync replication. The plus is transactions on the master will complete
> faster
> and do so in the absence of the standby. The minus is that you are in sort
> of an
> in between state.
>

I understand. My worry and requirement is to ensure master is not disturbed
for any reason.
In sync rep, the biggest worry is if standby server is unavailable and is
down for longer time, master hangs and will be in the same state until
standby comes back up or replication must be broken temporarily (until
standby comes back up) so that master runs without interruption. This is a
costly exercise on production from downtime perspective.

Personally, I take sync replication to be basically an all or nothing
> proposition. By setting it up you are saying you want, at minimum, two
> database
> clusters to be in sync at any point in time all the time (except for start
> up).
> If that is not possible then you are really looking for async replication.
>

Yeah. We will need to make a decision accordingly.

Thanks again,
VB

Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji

On Wed, Feb 15, 2012 at 4:12 PM, Andres Freund  wrote:

> On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote:
> > On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe
> wrote:
> > > On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji <
> venkat.bal...@verse.in>
> > > > all of these 1000 files get filled up in less than 5 mins, there are
> > > > chances that system will slow down due to high IO and CPU.
> > > As far as I know there is no data loss issue with a lot of checkpoint
> > > segments.
> > Data loss would be an issue when there is a server crash or pg_xlog crash
> > etc. That many number of pg_xlog files (1000) would contribute to huge
> data
> > loss (data changes not synced to the base are not guaranteed). Of-course,
> > this is not related to the current situation.  Normally we calculate the
> > checkpoint completion time, IO pressure, CPU load and the threat to the
> > data loss when we configure checkpoint_segments.
> I think you might be misunderstanding something. A high number of
> checkpoint_segments can lead to slower recovery - all those changes need
> to be
> reapplied - but it won't lead to lost data. The data inside the wal will be
> fsynced at appropriate times (commit; background writer; too much written).
>

Recovery would take time because all the changes in WAL files of pg_xlog
(which is high) must be replayed to reach consistent state. When disaster
strikes and if pg_xlogs are not available and data in WAL is not fsynced
yet, then recovery is not possible and data loss will be huge. It also
depends on how much data is not fsynced.

Thanks,
VB

Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji

> > Data loss would be an issue when there is a server crash or pg_xlog crash
> > etc. That many number of pg_xlog files (1000) would contribute to huge
> > data
> > loss (data changes not synced to the base are not guaranteed). Of-course,
> > this is not related to the current situation.  Normally we calculate the
> > checkpoint completion time, IO pressure, CPU load and the threat to the
> > data loss when we configure checkpoint_segments.
>
> So you're saying that by using small number of checkpoint segments you
> limit the data loss when the WAL gets corrupted/lost? That's a bit like
> buying a Maseratti and then not going faster than 10mph because you might
> crash at higher speeds ...
>

No. I am not saying that checkpoint_segments must be lower. I was just
trying to explain the IO over-head on putting high (as high as 1000)
checkpoint segments.  Lower number of checkpoint segments will lead to more
frequent IOs which is not good. Agreed.

>
> The problem here is that the WAL is usually placed on more reliable drives
> (compared to the data files) or a RAID1 array and as it's just writing
> data sequentially, so the usage pattern is much less likely to cause
> data/drive corruption (compared to data files that need to handle a lot of
> random I/O, etc.).
>

Agreed.


> So while it possible the WAL might get corrupted, the probability of data
> file corruption is much higher. And the corruption might easily happen
> silently during a checkpoint, so there won't be any WAL segments no matter
> how many of them you keep ...
>

Agreed. When corruption occurs, it really does not matter how many WAL
segments are kept in pg_xlog.
But, at any point of time if PG needs


> And by using low number of checkpoint segments it actually gets worse,
> because it means more frequent checkpoints -> more I/O on the drives ->
> more wearout of the drives etc.
>

Completely agreed. As mentioned above. I choose checkpoint_segments and
checkpoint_timeout once i observe the checkpoint behavior.

If you need to protect yourself against this, you need to keep a WAL
> archive (prefferably on a separate machine) and/or a hot standby for
> failover.
>

WAL archiving is a different situation where-in you need to backup the
pg_xlog files by enabling archiving.
I was referring to an exclusive situation, where-in pg_xlogs are not
archived and data is not yet been synced to base files (by bgwriter) and
the system crashed, then PG would depend on pg_xlog to recover and reach
the consistent state, if the pg_xlog is also not available, then there
would be a data loss and this depends on how much data is present in
pg_xlog files.

Thanks,
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-15 Thread Venkat Balaji

In-short, I would like to understand if i am achieving the same
asynchronous streaming replication by putting synchronous_commit='local' -

I understand that streaming replication is record based log-shipping.

Below is what shows up on our primary test server where we are testing
synchronous replication -

*1. Synchronous setup enabled with synchronous_commit='local'*

postgres=# select * from pg_stat_replication ;

procpid | usesysid | usename  | application_name | client_addr |
client_hostname | client_port |backend_start |   state   |
sent_location | write_locat
ion | flush_location | replay_location | sync_priority | sync_state
-+--+--+--+-+-+-+--+---+---+
++-+---+
   24099 |   10 | postgres | walreceiver  ||
  |   56432 | 2012-02-15 12:55:39.65663+03 | streaming |
0/E78 | 0/E78
| 0/E78  | 0/E78   | 1 | *sync*
(1 row)

postgres=# show synchronous_commit ;

synchronous_commit

* local*
(1 row)

postgres=# show synchronous_standby_names ;

synchronous_standby_names
-
 *
(1 row)

Does this mean that the system is still replicating synchronously ? If yes,
by what means ?

*Below is our actual production setup in 9.1.1 with asynchronous
replication setup -*

*2. Asynchronous enabled with synchronous_commit='on'*

psql (9.1.1)
Type "help" for help.

postgres=# select * from pg_stat_replication;

procpid | usesysid | usename  | application_name | client_addr |
client_hostname| client_port | backend_start |   state
  | sent_location | write
_location | flush_location | replay_location | sync_priority | sync_state
-+--+--+--+-+--+-+---+---+---+--
--++-+---+
3159 |   10 | postgres | walreceiver  |  |
 |   40165 | 2012-02-08 12:41:51.858897+03 | streaming |
1/86F83B50| 1/86F
83B50 | 1/86F83B50 | 1/86F83B50  | 0 | *async*

(1 row)

postgres=# show synchronous_commit ;
 synchronous_commit

 on
(1 row)

postgres=# show synchronous_standby_names ;

synchronous_standby_names
---

(1 row)

Operation wise, I am not seeing much difference by inserting few 1000 rows.
Its almost the same behavior both in asynch and sync rep.

Thanks,
VB


On Wed, Feb 15, 2012 at 11:11 AM, Venkat Balaji wrote:


> On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji wrote:
>
>>
>> On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver 
>> wrote:
>>
>>> On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
>>> > Hello,
>>> >
>>> > Disaster Recovery testing for Synchronous replication setup -
>>> >
>>> > When the standby site is down, transactions at the production site
>>> started
>>> > hanging (this is after the successful setup of synchronous
>>> replication).
>>> >
>>> > We changed synchronous_commit to 'local' to over-come this situation.
>>> >
>>> >  - No transactions are hanging at the production site even when the
>>> standby
>>> > is down
>>> >  - Standby is automatically getting synced when it is back up again.
>>> >
>>> > Can someone let us know if there are any "-ve" effects of putting
>>> > synchronous_commit='local' ??
>>> >
>>> > I am assuming that this as good as putting "synchronous_commit=on" on
>>> an
>>> > stand-alone system.
>>>
>>> It would seem you are really after streaming replication(which is
>>> asynchronous)
>>> more than synchronous replication. I have not used synchronous
>>> replication
>>> enough to be  sure, but I think by setting synchronous_commit='local'
>>> you are
>>> basically turning the system into a straight streaming(asynchronous)
>>> system
>>> anyway.
>>>
>>
>> Sorry. Ignore my earlier message -
>
> Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
> ). All i wanted to achieve is as follows -
>
> 1. Synchronous replication - which would perform transactions
> simultaneously on production and standby.
> 2. Ideally, if the commit does not occur at the standby site, then it
> would not commit at the production as well, which w

Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji

On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe wrote:

> On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji 
> wrote:
> >
> > On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt 
> wrote:
> >>
> >> We need to do a few bulk updates as Rails migrations.  We're a typical
> >> read-mostly web site, so at the moment, our checkpoint settings and WAL
> are
> >> all default (3 segments, 5 min, 16MB), and updating a million rows
> takes 10
> >> minutes due to all the checkpointing.
> >>
> >> We have no replication or hot standbys.  As a consumer-web startup, with
> >> no SLA, and not a huge database, and if we ever do have to recover from
> >> downtime it's ok if it takes longer.. is there a reason NOT to always
> run
> >> with something like checkpoint_segments = 1000, as long as I leave the
> >> timeout at 5m?
> >
> >
> > Still checkpoints keep occurring every 5 mins. Anyways
> > checkpoint_segments=1000 is huge, this implies you are talking about
> > 16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
> > perspective and data loss perspective. Even in the most unimaginable
> case if
> > all of these 1000 files get filled up in less than 5 mins, there are
> chances
> > that system will slow down due to high IO and CPU.
>
>

> As far as I know there is no data loss issue with a lot of checkpoint
> segments.
>

Data loss would be an issue when there is a server crash or pg_xlog crash
etc. That many number of pg_xlog files (1000) would contribute to huge data
loss (data changes not synced to the base are not guaranteed). Of-course,
this is not related to the current situation.  Normally we calculate the
checkpoint completion time, IO pressure, CPU load and the threat to the
data loss when we configure checkpoint_segments.

Re: [GENERAL] High checkpoint_segments

2012-02-14 Thread Venkat Balaji

On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt  wrote:

> We need to do a few bulk updates as Rails migrations.  We're a typical
> read-mostly web site, so at the moment, our checkpoint settings and WAL are
> all default (3 segments, 5 min, 16MB), and updating a million rows takes 10
> minutes due to all the checkpointing.
>
> We have no replication or hot standbys.  As a consumer-web startup, with
> no SLA, and not a huge database, and if we ever do have to recover from
> downtime it's ok if it takes longer.. is there a reason NOT to always run
> with something like checkpoint_segments = 1000, as long as I leave the
> timeout at 5m?
>

Still checkpoints keep occurring every 5 mins. Anyways
checkpoint_segments=1000 is huge, this implies you are talking about
16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
perspective and data loss perspective. Even in the most unimaginable case
if all of these 1000 files get filled up in less than 5 mins, there are
chances that system will slow down due to high IO and CPU.

You may think of increasing checkpoint_timeout as well, but, some
monitoring and analysis is needed to arrive at a number.

What does pg_stat_bgwriter say about checkpoints ?
Do you have log_checkpoints enabled ?

Thanks
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji

On Wed, Feb 15, 2012 at 11:01 AM, Venkat Balaji wrote:

>
> On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver wrote:
>
>> On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
>> > Hello,
>> >
>> > Disaster Recovery testing for Synchronous replication setup -
>> >
>> > When the standby site is down, transactions at the production site
>> started
>> > hanging (this is after the successful setup of synchronous replication).
>> >
>> > We changed synchronous_commit to 'local' to over-come this situation.
>> >
>> >  - No transactions are hanging at the production site even when the
>> standby
>> > is down
>> >  - Standby is automatically getting synced when it is back up again.
>> >
>> > Can someone let us know if there are any "-ve" effects of putting
>> > synchronous_commit='local' ??
>> >
>> > I am assuming that this as good as putting "synchronous_commit=on" on an
>> > stand-alone system.
>>
>> It would seem you are really after streaming replication(which is
>> asynchronous)
>> more than synchronous replication. I have not used synchronous replication
>> enough to be  sure, but I think by setting synchronous_commit='local' you
>> are
>> basically turning the system into a straight streaming(asynchronous)
>> system
>> anyway.
>>
>
> Sorry. Ignore my earlier message -

Yeah. Its a kind of asynchronous ( at the transaction level, NOT WAL based
). All i wanted to achieve is as follows -

1. Synchronous replication - which would perform transactions
simultaneously on production and standby.
2. Ideally, if the commit does not occur at the standby site, then it would
not commit at the production as well, which will cause production site
to hang. I do not want production site to hang if the standby site is
down or not accessible.
3. I would need the commit to occur on production and the production apps
should not be disturbed if the standby fails to respond. To achieve this,
I have set synchronous_commit='local' to ensure that transactions are
committed at production site first.

We do have streaming replication (of PG-9.0) setup on our other production
boxes, which is asynchronous and is WAL based.

Thanks
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji

On Tue, Feb 14, 2012 at 8:09 PM, Adrian Klaver wrote:

> On Tuesday, February 14, 2012 4:21:22 am Venkat Balaji wrote:
> > Hello,
> >
> > Disaster Recovery testing for Synchronous replication setup -
> >
> > When the standby site is down, transactions at the production site
> started
> > hanging (this is after the successful setup of synchronous replication).
> >
> > We changed synchronous_commit to 'local' to over-come this situation.
> >
> >  - No transactions are hanging at the production site even when the
> standby
> > is down
> >  - Standby is automatically getting synced when it is back up again.
> >
> > Can someone let us know if there are any "-ve" effects of putting
> > synchronous_commit='local' ??
> >
> > I am assuming that this as good as putting "synchronous_commit=on" on an
> > stand-alone system.
>
> It would seem you are really after streaming replication(which is
> asynchronous)
> more than synchronous replication. I have not used synchronous replication
> enough to be  sure, but I think by setting synchronous_commit='local' you
> are
> basically turning the system into a straight streaming(asynchronous) system
> anyway.
>

Yeah. Its a kind of asynchronous. All i wanted is as follows -

1

>
> >
> > We need to get this setup live on production shortly.
> >
> > Thanks
> > VB
>
> --
> Adrian Klaver
> adrian.kla...@gmail.com
>

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-14 Thread Venkat Balaji

Hello,

Disaster Recovery testing for Synchronous replication setup -

When the standby site is down, transactions at the production site started
hanging (this is after the successful setup of synchronous replication).

We changed synchronous_commit to 'local' to over-come this situation.

 - No transactions are hanging at the production site even when the standby
is down
 - Standby is automatically getting synced when it is back up again.

Can someone let us know if there are any "-ve" effects of putting
synchronous_commit='local' ??

I am assuming that this as good as putting "synchronous_commit=on" on an
stand-alone system.

We need to get this setup live on production shortly.

Thanks
VB

On Fri, Feb 10, 2012 at 4:47 PM, Venkat Balaji wrote:

>
> This issue stays resolved !!!
>
> The statements are no more hanging on production now :)
>
> The suspected problem was -
>
> Our brand new production server did not have the port 5432 open.
>
> I had opened the port using "iptables" command and everything started
> working.
>
> synchronous replication is fast and awesome.
>
> Thanks
> VB
>
>
> On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver wrote:
>
>> On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:
>>
>> >
>> > Connection is working fine between primary and standby, ping is working
>> > fine and wal archive file transfer is working without any issues.
>> >
>> > I tried CREATE TABLE and CREATE DATABASE, both were hanging.
>> >
>> > Apart from regular streaming replication settings, I did the following
>> on
>> > primary to enable synchronous replication -
>> >
>> > synchronous_standby_names='*'
>> >
>> > Commands started hanging after that. Is there anything else i need to
>> do.
>>
>> From here:
>>
>> http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html
>>
>> "
>> synchronous_standby_names (string)
>> ... The synchronous standby will be the first standby named in this list
>> that is
>> both currently connected and streaming data in real-time (as shown by a
>> state of
>> streaming in the pg_stat_replication view). Other standby servers
>> appearing
>> later in this list represent potential synchronous standbys
>>
>> The name of a standby server for this purpose is the application_name
>> setting of
>> the standby, as set in the primary_conninfo of the standby's walreceiver.
>> There
>> is no mechanism to enforce uniqueness. In case of duplicates one of the
>> matching
>> standbys will be chosen to be the synchronous standby, though exactly
>> which one
>> is indeterminate. The special entry * matches any application_name,
>> including
>> the default application name of walreceiver.
>>
>> "
>>
>> So I would check the pg_stat_replication view to see if Postgres is
>> seeing the
>> standby as streaming.
>>
>>
>> >
>> > Thanks
>> > VB
>>
>> --
>> Adrian Klaver
>> adrian.kla...@gmail.com
>>
>
>

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-10 Thread Venkat Balaji

This issue stays resolved !!!

The statements are no more hanging on production now :)

The suspected problem was -

Our brand new production server did not have the port 5432 open.

I had opened the port using "iptables" command and everything started
working.

synchronous replication is fast and awesome.

Thanks
VB

On Fri, Feb 3, 2012 at 9:45 PM, Adrian Klaver wrote:

> On Thursday, February 02, 2012 10:21:28 pm Venkat Balaji wrote:
>
> >
> > Connection is working fine between primary and standby, ping is working
> > fine and wal archive file transfer is working without any issues.
> >
> > I tried CREATE TABLE and CREATE DATABASE, both were hanging.
> >
> > Apart from regular streaming replication settings, I did the following on
> > primary to enable synchronous replication -
> >
> > synchronous_standby_names='*'
> >
> > Commands started hanging after that. Is there anything else i need to do.
>
> From here:
>
> http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html
>
> "
> synchronous_standby_names (string)
> ... The synchronous standby will be the first standby named in this list
> that is
> both currently connected and streaming data in real-time (as shown by a
> state of
> streaming in the pg_stat_replication view). Other standby servers appearing
> later in this list represent potential synchronous standbys
>
> The name of a standby server for this purpose is the application_name
> setting of
> the standby, as set in the primary_conninfo of the standby's walreceiver.
> There
> is no mechanism to enforce uniqueness. In case of duplicates one of the
> matching
> standbys will be chosen to be the synchronous standby, though exactly
> which one
> is indeterminate. The special entry * matches any application_name,
> including
> the default application name of walreceiver.
>
> "
>
> So I would check the pg_stat_replication view to see if Postgres is seeing
> the
> standby as streaming.
>
>
> >
> > Thanks
> > VB
>
> --
> Adrian Klaver
> adrian.kla...@gmail.com
>

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-02 Thread Venkat Balaji

On Thu, Feb 2, 2012 at 8:37 PM, Adrian Klaver wrote:

> On Wednesday, February 01, 2012 10:51:44 pm Venkat Balaji wrote:
> > Hello,
> >
> > I was testing the Postgres-9.1.1 synchronous streaming replication on our
> > UAT system.
> >
> > Without synchronous replication, everything was working fine.
> >
> > But, when i enabled synchronous_replication_names='*', the "create table"
> > started hanging for long time.
>
> Only the CREATE TABLE statement or all statements?
> In general terms synchronous replication moves at the speed of the
> connection
> between the primary and standby or does not occur if the standby can not be
> found. So what is the state of the connection between the primary and
> standby?
>
>
Connection is working fine between primary and standby, ping is working
fine and wal archive file transfer is working without any issues.

I tried CREATE TABLE and CREATE DATABASE, both were hanging.

Apart from regular streaming replication settings, I did the following on
primary to enable synchronous replication -

synchronous_standby_names='*'

Commands started hanging after that. Is there anything else i need to do.

Thanks
VB

Re: [GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-01 Thread Venkat Balaji

synchronous_commit is "on"

Thanks
VB

On Thu, Feb 2, 2012 at 12:31 PM, Raghavendra <
raghavendra@enterprisedb.com> wrote:

> What is the value of synchronous_commit ?
>
> ---
> Regards,
> Raghavendra
> EnterpriseDB Corporation
> Blog: http://raghavt.blogspot.com/
>
>
>
> On Thu, Feb 2, 2012 at 12:21 PM, Venkat Balaji wrote:
>
>> Hello,
>>
>> I was testing the Postgres-9.1.1 synchronous streaming replication on our
>> UAT system.
>>
>> Without synchronous replication, everything was working fine.
>>
>> But, when i enabled synchronous_replication_names='*', the "create table"
>> started hanging for long time.
>>
>> When i pressed "Ctrl+C" i got the following message -
>>
>> Cancel request sent
>> WARNING:  canceling wait for synchronous replication due to user request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been replicated to the standby.
>> CREATE TABLE
>>
>> Can someone please help us ?
>>
>> Thanks
>> VB
>>
>
>

[GENERAL] [GENERA]: Postgresql-9.1.1 synchronous replication issue

2012-02-01 Thread Venkat Balaji

Hello,

I was testing the Postgres-9.1.1 synchronous streaming replication on our
UAT system.

Without synchronous replication, everything was working fine.

But, when i enabled synchronous_replication_names='*', the "create table"
started hanging for long time.

When i pressed "Ctrl+C" i got the following message -

Cancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have
been replicated to the standby.
CREATE TABLE

Can someone please help us ?

Thanks
VB

[GENERAL]: streaming replication on PG-9.1.1

2011-12-13 Thread Venkat Balaji

Hello,

We have configured "streaming replication" (not synchronous) for our
production in PG-9.1.1.

Replication is working fine, we can see the transactions getting replicated
without any issues.

I see the below problem -

pg_stat_replication on master shows no rows all the time..

ps -Af | grep sender | grep -v grep  -- has never shown sender process on
master
ps -Af | grep receiver | grep -v grep -- has never shown receiver process
on slave

Note : We do not have active transactions all the time on master.

Please help !

Thanks
VB

Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-01 Thread Venkat Balaji

2011/12/2 Oleg Serov 

> And, i'm an idiot.
>
> My DB version:
> PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit
>
>
>
> 2011/12/2 Oleg Serov 
>
>> Hello, i have a problem.
>>
>> I've got a production server, working fine. Then i've got strange error:
>> > ERROR:  right sibling's left-link doesn't match: block 147 links to 407
>> instead of expected 146 in index "order_status_key"'
>> And decidet to backup all server. So i shut-down VPS with server and
>> backup all data.
>> Then, after i booted it - and then - i've got Data loss.
>>
>
This seems to be an Index corruption. Did you try re-indexing ? Index
creation might have failed, re-indexing would re-organize the Index tuples.
If you are sure about disk corruption, try and "re-create" or "create
concurrent Index" on a different disk.


> I've lost data, that have been written to DB around 10-100 hours
>> (different tables, have different last updated value).
>>
>> Then i've analyzed log, and found this:
>> 7 days ago appears this errors:
>> db= LOG:  could not rename temporary statistics file
>> "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
>> db= WARNING:  pgstat wait timeout
>>  ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076
>
>
This should be a free space issue, do you have enough space in
"pg_stat_tmp" disk ?

5 days ago:
>> a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
>> flushed only to F/526512E0
>>   83238 db= WARNING:  could not write block 54 of base/16384/2619
>>   83239 db= CONTEXT:  writing block 54 of relation base/16384/2619
>>
> And today:
>>  18 db= LOG:  could not open file "pg_xlog/0001000F0052"
>> (log file 15, segment 82):
>>  19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
>> flushed only to F/52FDF0E0
>>
>
>> There is any ability to recover fresh data from database?
>>
>
What kind of backups you have available ?


Thanks
VB

Re: [GENERAL] How to restore the table space tar files created by pg_basebackup?

2011-11-30 Thread Venkat Balaji

Do you have Tablespace directories with a softlink to the data directory ?

Thanks
VB

On Wed, Nov 30, 2011 at 7:42 PM, Samba  wrote:

> Hi all,
>
> I have taken a base backup of my master server using pg_basebackup command
> as below:
>  pg_basebackup -D /tmp/PostgresBackup/ -Ft -Z 9 -l masterbackup -h
> localhost -U replication -w
>
> The above created 4 tar files, namely: 16394.tar.gz  16395.tar.gz
>  16396.tar.gz  base.tar.gz
>
> I do know that my database contains 3 table spaces in addition to
> pg_default and pg_global ( I guess, that is why it created those three
> numbered tar.gz files, plus one base.tar.gz file ) and my master and
> standby servers are identical by all means.
>
> Now, I'm not sure how can I restore these files on the standby server. I
> could restore the base.tar.gz into the data directory on standby and the
> streaming replication has started working properly. But I'm not sure what
> to do with these additional numbered gz files which contains the same data
> that is already contained in the base.tar.gz file.
>
> Can some one explain me what to do with these files? The documentation for
> pg_basebackup does not mention this information, it just says that a
> different variant of the command will fail if there are multiple table
> spaces.
>
> Another related query is if  we can specify the name of the backup file
> instead of leaving it to be base.tar.gz file.
>
> Thanks and Regards,
> Samba
>
>
>
>
>

[GENERAL] : pg_compresslog (pglesslog)

2011-11-30 Thread Venkat Balaji

Hello Everyone,

Can someone please help me know if there exists a "pglesslog" version for
PG-9.0.

I only see beta version (1.4.2) for pg9.0 being released sometime ago.

Anyone using "pg_lesslog_1.4.2_pg90_beta.tar" for PG-9.0 production
successfully ?

Can we use the above said version on production ?

Please help !

Thanks
VB

Re: [GENERAL] Incremental backup with RSYNC or something?

2011-11-17 Thread Venkat Balaji

On Fri, Nov 18, 2011 at 6:08 AM, Phoenix Kiula wrote:

> On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji 
> wrote:
> >> Question: what can I do to rsync only the new additions in every table
> >> starting 00:00:01 until 23:59:59 for each day?
> >
> > A table level replication (like Slony) should help here.
>
>
> Slony needs more than one physical server, right?
>

Not necessarily, you can replicate with-in the same host as well. It all
depends on which tables you want to replicate.
It does not depend on number of hosts.

Thanks
VB

Re: [GENERAL] : Postgres installation error on CentOS

2011-11-15 Thread Venkat Balaji

Hi Alban,

Thanks for the reply !

I was able to resolve this issue, but, not by removing the older version of
zlib (i was unable to do so due to dependencies).

I did not have older version of zlib-devel installed, I did that and able
to install.

Regards,
VB

On Tue, Nov 15, 2011 at 5:56 PM, Alban Hertroys  wrote:

> On 15 November 2011 12:58, Venkat Balaji  wrote:
> > Hello,
> > We are facing an issue while installing Postgres-9.0.1 on CentOS-5.
>
> That name always makes me wonder when they're releasing PennyOS or
> DollarOS :P
>
> > Below is the error we are encountering -
> > ./configure -- output
> > checking for inflate in -lz... no
> > configure: error: zlib library not found
> ...
> > /usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz
>
> Apparently your installed libz doesn't provide a function that
> configure is checking for. Perhaps upgrading it helps, if possible?
>
> Another possibility is that configure gets pointed to an old version
> of zlib as the first result from LD. You could try removing that, but
> you probably have dependencies on it from other packages.
>
> > Zlib rpms are installed and below is the rpm -qa output -
> > [root@localhost postgresql-9.0.1]# rpm -qa | grep zlib
> > zlib-1.2.3-4.el5
> > jzlib-1.0.7-4jpp.1
> > zlib-devel-1.2.3-4.el5
> > zlib-1.2.3-3
>
> As a non-linux user this doesn't mean much to me.
> --
> If you can't see the forest for the trees,
> Cut the trees and you'll see there is no forest.
>

[GENERAL] : Postgres installation error on CentOS

2011-11-15 Thread Venkat Balaji

Hello,

We are facing an issue while installing Postgres-9.0.1 on CentOS-5.

Below is the error we are encountering -

./configure -- output

checking for inflate in -lz... no
configure: error: zlib library not found
If you have zlib already installed, see config.log for details on the
failure.  It is possible the compiler isn't looking in the proper directory.
Use --without-zlib to disable zlib support.

Inside the config.log, below is what we see -

configure:8204: checking for inflate in -lz
configure:8239: gcc -o conftest -O2 -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing -fwrapv  -D_GNU_SOURCEconftest.c -lz  -lreadline
-ltermcap -lcrypt -ldl -lm  >&5
/usr/bin/ld: skipping incompatible /usr/lib/libz.a when searching for -lz
/usr/bin/ld: cannot find -lz

Zlib rpms are installed and below is the rpm -qa output -

[root@localhost postgresql-9.0.1]# rpm -qa | grep zlib
zlib-1.2.3-4.el5
jzlib-1.0.7-4jpp.1
zlib-devel-1.2.3-4.el5
zlib-1.2.3-3

Thanks
VB

Re: [GENERAL] Incremental backup with RSYNC or something?

2011-11-13 Thread Venkat Balaji

>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?
>

A table level replication (like Slony) should help here.

Or

A trigger based approach with dblink would be an-other (but, a bit complex)
option.

Thanks
VB

Re: [GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0

2011-11-11 Thread Venkat Balaji

This problem has been resolved !!

Thanks
VB

On Fri, Nov 11, 2011 at 9:58 PM, Venkat Balaji wrote:

> Hello,
>
> WAL Archive process in our production is not working.
>
> [postgres@hostname]$ ps -ef | grep archive
> postgres 12077 16015  0 10:19 pts/400:00:00 grep archive
> postgres 31126 27607  0 Nov10 ?00:01:18 postgres: archiver process
>   failed on 00010F7200F0
>
> I see WAL files getting accumulated in pg_xlog location and the status in
> "archive_status" is shown as ".ready".
>
> Is there anyway we can only restart archiving process without disturbing
> the actual cluster ?
>
> Actually, we had killed a process using "kill  -9 " and the db went into
> recovery mode and was back up and running.
>
> We have no issues with the application as well.
>
> postgres=# select pg_is_in_recovery();
>
>  pg_is_in_recovery
> ---
>  f
> (1 row)
>
> Please help to resolve this !
>
> Thanks
> VB
>

[GENERAL] : postgres: archiver process failed on 0000000100000F72000000F0

2011-11-11 Thread Venkat Balaji

Hello,

WAL Archive process in our production is not working.

[postgres@hostname]$ ps -ef | grep archive
postgres 12077 16015  0 10:19 pts/400:00:00 grep archive
postgres 31126 27607  0 Nov10 ?00:01:18 postgres: archiver process
  failed on 00010F7200F0

I see WAL files getting accumulated in pg_xlog location and the status in
"archive_status" is shown as ".ready".

Is there anyway we can only restart archiving process without disturbing
the actual cluster ?

Actually, we had killed a process using "kill  -9 " and the db went into
recovery mode and was back up and running.

We have no issues with the application as well.

postgres=# select pg_is_in_recovery();

 pg_is_in_recovery
---
 f
(1 row)

Please help to resolve this !

Thanks
VB

Re: [GENERAL] : failed: ERROR: could not open file "base/44620/972355": No such file or directory

2011-11-03 Thread Venkat Balaji

Sorry forgot to mention the thread I referred to -

http://archives.postgresql.org/pgsql-general/2010-12/msg01000.php

Thanks
VB

On Thu, Nov 3, 2011 at 3:48 PM, Venkat Balaji wrote:

> Hello Everyone,
>
> We had recently taken an online backup of our production database cluster
> (pg_start_backup() - rsync - pg_stop_backup()).
>
> We had built the testing cluster with the backup.
>
> When we try to  vacuum the database or vacuum full the testing database,
> we are getting the following error.
>
>  vacuuming of database failed: ERROR:  could not open file
> "base/44620/972355": No such file or directory
>
> In an other situation-
>
> postgres=# select * from table1 limit 10;
> ERROR:  could not open file "base/44620/1022275": No such file or directory
>
> After going through the below thread, one situation which we believe could
> be the problem is -
>
> - Table1 is a partitioned table and we keep dropping and adding partitions
> on daily basis. This might have happened at the time we were backing up.
>
> Is this the problem ? If yes, what precautions we need to take ? perform
> the backup when there are no DDLs are happening ?
>
> Both the base files "972355" and "1022275" are not existing in production
> (probably they belong to dropped partitions --  not sure).
>
> We have a data center migration coming up, so, this is very critical for
> us -
>
> PG Version - 9.0.1
> OS - RHEL 5.5 (prod) and RHEL 6.1 (test)
>
> Please help
>
> Thanks
> VB
>

[GENERAL] : failed: ERROR: could not open file "base/44620/972355": No such file or directory

2011-11-03 Thread Venkat Balaji

Hello Everyone,

We had recently taken an online backup of our production database cluster
(pg_start_backup() - rsync - pg_stop_backup()).

We had built the testing cluster with the backup.

When we try to  vacuum the database or vacuum full the testing database, we
are getting the following error.

 vacuuming of database failed: ERROR:  could not open file
"base/44620/972355": No such file or directory

In an other situation-

postgres=# select * from table1 limit 10;
ERROR:  could not open file "base/44620/1022275": No such file or directory

After going through the below thread, one situation which we believe could
be the problem is -

- Table1 is a partitioned table and we keep dropping and adding partitions
on daily basis. This might have happened at the time we were backing up.

Is this the problem ? If yes, what precautions we need to take ? perform
the backup when there are no DDLs are happening ?

Both the base files "972355" and "1022275" are not existing in production
(probably they belong to dropped partitions --  not sure).

We have a data center migration coming up, so, this is very critical for us
-

PG Version - 9.0.1
OS - RHEL 5.5 (prod) and RHEL 6.1 (test)

Please help

Thanks
VB

Re: [GENERAL] Server move using rsync

2011-11-02 Thread Venkat Balaji

>
> We're not doing this long-term, in order to have a backup server we can
> fail-over to, but rather as a one-off low impact move of our database.
> Consequently, instead of using pg_start_backup and pg_stop_backup, and
> keeping all WAL, we're stopping the database, rsync of everything, and
> starting the database in the new server, with it appearing to the new
> server (if it was capable of noticing such things) that it had simply been
> shutdown and restarted.
>

This is fine. If the database is shutdown, then the backup is completely
safe. You can bring up the cluster as on backup time without any issues.


> The initial and repeated rsyncs while the first server is running and in
> use, are solely in order to reduce the time that the rsync takes while the
> postgresql application is stopped.
>


> Do you still think we need to do anything special with pg_start_backup,
> pg_stop_backup, and WAL archives?
>

Yes, after the initial sync, if the next repeated rsyncs are performed
while the database cluster is up and running, then
"pg_start_backup()-rsync-pg_stop_backup()" (as said earlier) must be
performed. This will help Postgres know that the backup is going on. When
you do pg_start_backup(), Postgres will make note and updates all the base
file headers and makes a note of the TXN ids and Checkpoint time by
creating a label. So, the WAL archives at time are needed for recovery (to
recover any half written transactions).

Without doing pg_start_backup, and with rsync not performing a "snapshot"
> backup, my assumption is that until we do an rsync with the service
> shutdown, whatever we've got at the location we're copying to, is not
> self-consistent.
>

Above explanation should answer this.


> If we start up postgresql on it, won't it think it is recovering from a
> sudden crash? I think it may either appear to recover ok, or complain about
> various things, and not start up ok, with neither option providing us with
> much insight, as all that could tell us is that either some disk blocks are
> consistent, or some are not, which is our starting assumption anyway.
>

Starting up postgresql would probably result in more disk block changes
> that will result in more work next time we rsync.
>

This is normal behavior of rsync. It all depends on how volatile is your
system and volume of changes performed.


> How badly can we screw things up, given we intend to perform a final rsync
> with no postgresql services running? What should we try and avoid doing,
> and why?
>

> We might simply compare some hashes between the two systems, of some files
> that haven't had their last-modified dates changed since the last rsync.
>

All this will be taken care by Postgres with the help of WAL archive files
generated at the time when you performed rsync with postgres services up
and running.

Thanks
VB

Re: [GENERAL] Server move using rsync

2011-10-28 Thread Venkat Balaji

>
> "Another option is to use rsync to perform a file system backup. This is
> done by first running rsync while the database server is running, then
> shutting down the database server just long enough to do a second rsync. The
> second rsync will be much quicker than the first, because it has relatively
> little data to transfer, and the end result will be consistent because the
> server was down. This method allows a file system backup to be performed
> with minimal downtime."
>
> Except that we plan on an initial rsync which we think might take a couple
> of days, then subsequent daily rsyncs for up to a week to keep it up to date
> till we stop the old database, rsync again, and start the new database.
>

We are performing backups to our production server exactly the same way. We
have been through some problems while restoring and bringing up the
database. If you are planning to take initial complete rsync with subsequent
"incremental rsyncs", then you need to make sure that you have all the WAL
archives starting from the initial rsync on Day 1.

Also are you doing the following?

1. pg_start_backup() - rsync - pg_stop_backup() ?
2. Please let us know your WAL Archive backup strategy.

Is there any way during that week, that we can verify whether our partially
> completed database move process is going to result in a database that starts
> up ok?
>

In general, yes, database can start up normally. Without WAL Archives,
recovering to a particular time would not be possible.

Thanks
VB

Re: [GENERAL] Are pg_xlog/* fiels necessary for PITR?

2011-10-27 Thread Venkat Balaji

On Thu, Oct 27, 2011 at 7:57 PM, rihad  wrote:

> Hi, I'm backing up the entire server directory from time to time. pg_xlog/
> directory containing WAL files is pretty heavy (wal_level=archive). Can I
> exclude it from the regular tar archive?
>

The best would be to perform "pg_switch_xlog()" and take a backup excluding
pg_xlog.

To recover the last moment TXNs, you might need pg_xlog (depends on when you
would be recovering). pg_switch_xlog() will reduce the dependency on pg_xlog
files to a greater extent.

>
> #!/bin/sh
>
> renice 20 $$ 2>/dev/null
> pgsql -U pgsql -q -c "CHECKPOINT" postgres # speed up pg_start_backup()
>

pg_start_backup() performs a checkpoint and ensures that all the data till
that particular checkpoint and TXN id will be backed up (or marked as needed
for data consistency while restoring and recovering).

> pgsql -U pgsql -q -c "select pg_start_backup('sol')" postgres
> tar -cjf - /db 2>/dev/null | ssh -q -i ~pgsql/.ssh/id_rsa -p 2022 -c
> blowfish dbarchive@10.0.0.1 'cat > db.tbz'
> pgsql -U pgsql -q -c "select pg_stop_backup()" postgres
> sleep 60 #wait for new WAL backups to appear
> echo 'ssh -q dbarchive@10.0.0.1 ./post-backup.sh' | su -m pgsql
>
>
> I want to change tar invocation to be: tar -cjf --exclude 'db/pg_xlog/*'
> ...
>
> Will there be enough data in case of recovery? (May God forbid... )))
>

But, all the WAL Archives between backup start time and end time must be
backed up. They are needed at any cost for the database to be consistent and
the recovery to be smooth.

Recovering to any point-in-time purely depends on your backup strategy.

Thanks
VB

Re: [GENERAL] List Permissions

2011-10-25 Thread Venkat Balaji

My answers are in line in RED -

  How can I list a users permissions table by table?
>
>   i.e.  User Joe
>  has read/write on table1
>
 has read on table2
>  no access on table 3
>

For a particular user you can use below function. You can write a SQL query
or script which takes table names from "pg_tables" one by one.

has_table_privilege(user, table, privilege)

Example :

I am checking if user "postgres" has "select" privilege on "table1".

postgres=# select has_table_privilege('postgres','public.table1','select');

has_table_privilege
-
 t
(1 row)


For current user (user you logged in as) you can use the following function

has_table_privilege(table, privilege)

I am checking if the current_user has "select" privilege on "table1"

Example:

postgres=# select current_user;

current_user
--
postgres

(1 row)

postgres=# select has_table_privilege('public.table1','select');

has_table_privilege
-
 t

Below link has all the other functions regarding checking permissions

http://www.postgresql.org/docs/9.0/static/functions-info.html

Hope this helps !

Thanks
VB

Re: [GENERAL] : PostgreSQL Online Backup

2011-10-24 Thread Venkat Balaji

Sorry for not responding to this email for so long.

Alan,

We had mentioned the following line in recovery.conf file (we had given
pg_xlog location since we did not have WAL archives) -

restore_command = 'cp /pg_xlog/%f %p'

We found where the problem was -

Here is what i did -

1. We had taken a full backup using pg_start_backup() and pg_stop_backup()
on Day 1
2. Rest of the days (from Day 2 - Day 15), we had incrementally backed-up (
this is also using pg_start_backup() and pg_stop_backup())
3. On Day-16th, when i started the recovery, PG was asking Day 1's WAL
archive file, which we did not have.

A fresh complete backup with change in our backup strategy resolved the
issue.

Thanks a lot for all your inputs and help on this !!

Regards,
VB

2011/10/4 Alan Hodgson 

> > > rsync works fine. Why exactly can't the recovery find the backed up
> copy
> > > of 000105390076? Please post your archive_command settings,
> > > the contents of any script(s) called by that, and the recovery.conf
> file
> > > you're using that's having problems, as well as the complete process
> you
> > > followed to
> > > initiate recovery. I strongly suspect you're missing part of the
> process
> > > of actually saving the WAL files needed for recovery.
>
> > The recovery is unable to find the WAL archive because, it was generated
> on
> > 26th September.
> >
> > Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the
> files.
> >
> > I do not have that WAL archive copy.
> >
> > The problem area -
> >
> > I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
> > 256K).
> >
>
> I'm going to need the rest of what I asked for to offer any further
> suggestions
> - especially the full and exact steps you took to initiate recovery and the
> contents of recovery.conf. Also, please don't top-post.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji

The recovery is unable to find the WAL archive because, it was generated on
26th September.

Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files.

I do not have that WAL archive copy.

The problem area -

I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
256K).

Thanks
VB

2011/10/3 Alan Hodgson 

> On October 3, 2011 05:33:35 AM Venkat Balaji wrote:
> > Did anyone observe this behavior ?? Please help !
> >
> > This is critical for us. I want to recommend not to use "rsync" (use cp
> or
> > scp instead) for production backup.
> >
>
> rsync works fine. Why exactly can't the recovery find the backed up copy of
> 000105390076? Please post your archive_command settings, the
> contents of any script(s) called by that, and the recovery.conf file you're
> using that's having problems, as well as the complete process you followed
> to
> initiate recovery. I strongly suspect you're missing part of the process of
> actually saving the WAL files needed for recovery.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji

Another problem in recovery (probably because of "rsync") -

As said earlier, we are taking a production backup everyday incrementally
using "rsync".

But, Postgres some how misses to sync few files in between and keeps on
asking the back dated archive files (more than 1 week ago).

I restored October 2nd backup and PG is asking for September 26th archive
file with the last known time as 26th Sep, 2011.

2011-10-03 07:17:12 CDT [12705]: [1-1] LOG:  database system was
interrupted; last known up at 2011-09-26 09:01:36 CDT
2011-10-03 07:17:12 CDT [12705]: [2-1] LOG:  starting archive recovery
cp: cannot stat
`/usr/local/pgsql9.0.1/obtdata/data/pg_xlog/000105390076': No
such file or directory
2011-10-03 07:17:12 CDT [12705]: [3-1] LOG:  could not open file
"pg_xlog/000105390076" (log file 1337, segment 118): No such
file or directory
2011-10-03 07:17:12 CDT [12705]: [4-1] LOG:  invalid checkpoint record
2011-10-03 07:17:12 CDT [12705]: [5-1] PANIC:  could not locate required
checkpoint record
2011-10-03 07:17:12 CDT [12705]: [6-1] HINT:  If you are not restoring from
a backup, try removing the file
"/usr/local/pgsql9.0.1/obtdata/data/backup_label".
2011-10-03 07:17:12 CDT [12702]: [1-1] LOG:  startup process (PID 12705) was
terminated by signal 6: Aborted
2011-10-03 07:17:12 CDT [12702]: [2-1] LOG:  aborting startup due to startup
process failure

I always see pg_clog files and some base files not getting synced.

Below is what we are doing -

pg_start_backup()
rsync the data directory
pg_stop_backup()

The first time "rsync" is fine, but, the subsequent runs are generating
in-consistency.

We do the same every day to backup the data directory incrementally.

What i observed is PG records the TXN id when ever backup starts and stops +
backup label. The next day when PG records the start backup time and TXN id,
i think some of the TXN ids and pg_clog files generated between last stop
time and the next start time are missed.

Did anyone observe this behavior ?? Please help !

This is critical for us. I want to recommend not to use "rsync" (use cp or
scp instead) for production backup.

Thanks
VB

On Tue, Sep 27, 2011 at 2:36 PM, Albe Laurenz wrote:

> Venkat Balaji wrote:
> > Our problem is -
> >
> > We had mistakenly executed  "rsync" on the running PostgreSQL data
> directory (production) and we did
> > not run "pg_start_backup()".
> >
> > Will this harm production ? can this lead to corruption ?
>
> I assume that you used rsync to copy *from* the data directory.
>
> This cannot lead to data corruption.
> Only performance might suffer temporarily due to the additional I/O.
>
> The backup made with rsync will be unusable without pg_start_backup().
>
> Yours,
> Laurenz Albe
>

Re: [GENERAL] could not access file "$libdir/pg_buffercache": No such file or directory

2011-09-29 Thread Venkat Balaji

One way could be -

If the restore you are performing is not very big. Then pg_dump can be taken
with "--inserts" and we can remove "create function " lines
from the output file and restore.

Thanks
VB

On Fri, Sep 30, 2011 at 10:59 AM, Venkat Balaji wrote:

> I had faced the same problem 2 days earlier and that was for
> "pg_freespacemap" contrib module.
>
> I did not know the way to ignore these functions and installed THE contrib
> modules and restored. It worked !
>
> I am also looking for a way to ignore these functions.
>
> Thanks
> VB
>
>
> On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn wrote:
>
>> I'm in the process of testing out Postgres 9.0 for production use.  I've
>> been using it for development on my mac, a build from EnterpriseDB.  We've
>> just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a
>> backport from lucid.  There's an existing 8.4.8 postgres install also on
>> that machine, but as far as I'm aware the debs are set up so you can happily
>> have two installs side by side.
>>
>> I've dumped a test DB from my laptop and attempted to restore it on to the
>> ubuntu machine, but I see errors:
>>
>> royce@fishy:~$ createdb test
>> royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc
>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
>> pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION
>> pg_buffercache_pages() persona
>> pg_restore: [archiver (db)] could not execute query: ERROR:  could not
>> access file "$libdir/pg_buffercache": No such file or directory
>>Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF
>> record
>>LANGUAGE c
>>AS '$libdir/pg_buffercache', 'pg_buffercache_...
>>
>>
>> After looking around a bit, my theory on these errors is that my mac has
>> some contrib module installed that adds views to my test database, and those
>> views have been included in the dump.  The contrib module is apparently not
>> installed on the ubuntu machine and hence the restore can't create the
>> views.  Is this theory correct?  If so:
>>
>> - Can these errors be safely ignored? (not ideal, because I'll miss other,
>> real errors)
>> - Is there some simple way I can prevent these views from being emitted as
>> part of the dump?
>> - Is there some simple way I can prevent these views from being restored
>> from a dump that contains them?
>> - Otherwise, how can I purge these contrib modules from my laptop?
>>
>> Otherwise, if my theory is incorrect, any hints as to what it might be?
>>
>> Cheers!
>>
>> --Royce
>>
>> Mac:
>>
>>  version
>>
>> 
>>  PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC
>> i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
>> (1 row)
>>
>>
>> Ubuntu box:
>>
>>   version
>>
>> -
>>  PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
>> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
>> (1 row)
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

Re: [GENERAL] could not access file "$libdir/pg_buffercache": No such file or directory

2011-09-29 Thread Venkat Balaji

I had faced the same problem 2 days earlier and that was for
"pg_freespacemap" contrib module.

I did not know the way to ignore these functions and installed THE contrib
modules and restored. It worked !

I am also looking for a way to ignore these functions.

Thanks
VB

On Fri, Sep 30, 2011 at 8:15 AM, Royce Ausburn  wrote:

> I'm in the process of testing out Postgres 9.0 for production use.  I've
> been using it for development on my mac, a build from EnterpriseDB.  We've
> just installed a 9.0.5 on an Ubuntu (Ubuntu 10.04.3 LTS) machine from a
> backport from lucid.  There's an existing 8.4.8 postgres install also on
> that machine, but as far as I'm aware the debs are set up so you can happily
> have two installs side by side.
>
> I've dumped a test DB from my laptop and attempted to restore it on to the
> ubuntu machine, but I see errors:
>
> royce@fishy:~$ createdb test
> royce@fishy:~$ pg_restore -d test --no-owner test_RTR.pgc
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 65; 1255 2194102 FUNCTION
> pg_buffercache_pages() persona
> pg_restore: [archiver (db)] could not execute query: ERROR:  could not
> access file "$libdir/pg_buffercache": No such file or directory
>Command was: CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF record
>LANGUAGE c
>AS '$libdir/pg_buffercache', 'pg_buffercache_...
>
>
> After looking around a bit, my theory on these errors is that my mac has
> some contrib module installed that adds views to my test database, and those
> views have been included in the dump.  The contrib module is apparently not
> installed on the ubuntu machine and hence the restore can't create the
> views.  Is this theory correct?  If so:
>
> - Can these errors be safely ignored? (not ideal, because I'll miss other,
> real errors)
> - Is there some simple way I can prevent these views from being emitted as
> part of the dump?
> - Is there some simple way I can prevent these views from being restored
> from a dump that contains them?
> - Otherwise, how can I purge these contrib modules from my laptop?
>
> Otherwise, if my theory is incorrect, any hints as to what it might be?
>
> Cheers!
>
> --Royce
>
> Mac:
>
>  version
>
> 
>  PostgreSQL 9.0.4 on x86_64-apple-darwin, compiled by GCC
> i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
> (1 row)
>
>
> Ubuntu box:
>
>   version
>
> -
>  PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
> (1 row)
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] : Looking for a PostgreSQL book

2011-09-28 Thread Venkat Balaji

Thanks Adam !

Regards,
VB

On Thu, Sep 29, 2011 at 12:03 AM, Adam Cornett wrote:

> The same publisher (Packt) has a book *PostgreSQL
> 9 Administration Cookbook* by Simon Riggs and Hannu Krosing that is
> equally useful as Greg's *High Performance* book
>
>
> On Wed, Sep 28, 2011 at 1:14 PM, Venkat Balaji wrote:
>
>> Hello Everyone,
>>
>> I have been working on PostgreSQL for quite a while (2 yrs) now.
>>
>> I have got "PostgreSQL 9.0 High Performance" book and quite excited to go
>> through it.
>>
>> Please let me know any source where i can get more books on PG, I am
>> especially looking for books on PG internals, architecture, Backup &
>> Recovery and HA.
>>
>> Looking forward for the information.
>>
>> Regards,
>> VB
>>
>
>
>
> --
> Adam Cornett
> adam.corn...@gmail.com
> (678) 296-1150
>

[GENERAL] : Looking for a PostgreSQL book

2011-09-28 Thread Venkat Balaji

Hello Everyone,

I have been working on PostgreSQL for quite a while (2 yrs) now.

I have got "PostgreSQL 9.0 High Performance" book and quite excited to go
through it.

Please let me know any source where i can get more books on PG, I am
especially looking for books on PG internals, architecture, Backup &
Recovery and HA.

Looking forward for the information.

Regards,
VB

Re: [GENERAL] PostgreSQL recovery when lost some file in data\global

2011-09-27 Thread Venkat Balaji

Hi Tuan Hoang Anh,

Are you able to bring up the cluster ??

Please let us know what problem you are facing.

Thanks
Venkat

On Tue, Sep 27, 2011 at 12:08 PM, tuanhoanganh  wrote:

> I am running PostgreSQL 9.0.1 32bit on windows 2003. Last night my disk had
> some problem and i lost some file in data\global.
> Is there anyway to recovery postgresql.
>
> Thanks in advance. Sorry for my English.
>
> Tuan Hoang Anh
>

Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji

Thanks for all your inputs !

Our problem is -

We had mistakenly executed  "rsync" on the running PostgreSQL data directory
(production) and we did not run "pg_start_backup()".

Will this harm production ? can this lead to corruption ?

Thanks -

On Mon, Sep 26, 2011 at 10:29 PM, Alan Hodgson  wrote:

> On September 26, 2011 05:49:50 AM Venkat Balaji wrote:
> > I tried restoring the backup, after taking the full backup.
> >
> > Below is what i see in the "archive destination".
> >
> > Postgres was asking for "00010193006F" and i tried to find
> the
> > same and below is what i find...
> >
> > -rw--- 1 postgres postgres 3.3M Sep 26 02:06
> > 00010193006F.gz -rw--- 1 postgres postgres  219 Sep 26
> > 02:53
> > 00010193006F.00328508.backup.gz
> >
> > Why is PG (9.0) putting an extension for the WAL Archive file as
> > > "backup.gz" ??
> >
>
> The archive files are created by your archive_command, as specified in
> postgresql.conf. My guess would be that your archive command runs the files
> through gzip as part of archiving (which is fine).
>
> However, the restore_command you specify in recovery.conf  must undo this
> compression. So instead of (for example) 'cp -f "%f" "%p"', it might
> instead
> need to look like 'zcat "%f" > "%p"'.
>
> Hope this helps.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji

I tried restoring the backup, after taking the full backup.

Below is what i see in the "archive destination".

Postgres was asking for "00010193006F" and i tried to find the
same and below is what i find...

-rw--- 1 postgres postgres 3.3M Sep 26 02:06 00010193006F.gz
-rw--- 1 postgres postgres  219 Sep 26 02:53
00010193006F.00328508.backup.gz

Why is PG (9.0) putting an extension for the WAL Archive file as "backup.gz"
??

Please help !

Thanks
VB

On Mon, Sep 26, 2011 at 5:11 PM, Venkat Balaji wrote:

> Hello Everyone,
>
> We have had situations where-in "rsync" was executed without executing
> "pg_start_backup()" on the production data directory and on the next runs,
> "pg_start_backup()" has been executed with "rsync". This was to avoid high
> IO load on production. We ended up getting unmatched files (especially in
> pg_clog) and not sure about "base" directory.
>
> Postgres is asking for WAL Archive files dated sometime around 15 days ago.
> We are absolutely not sure whats going on.
>
> Is this dangerous for production (like corruption) ? or just the backup
> will be invalid ? Please help us know if we have to perform any
> precautionary checks on the production cluster.
>
> Apart from firing a checkpoint, does "pg_start_backup()" updates any
> dictionary tables or views ? or it updates anything in "pg_xlog"
>
> Looking forward for your help !
>
> Thanks
> VB
>
>
>

[GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji

Hello Everyone,

We have had situations where-in "rsync" was executed without executing
"pg_start_backup()" on the production data directory and on the next runs,
"pg_start_backup()" has been executed with "rsync". This was to avoid high
IO load on production. We ended up getting unmatched files (especially in
pg_clog) and not sure about "base" directory.

Postgres is asking for WAL Archive files dated sometime around 15 days ago.
We are absolutely not sure whats going on.

Is this dangerous for production (like corruption) ? or just the backup will
be invalid ? Please help us know if we have to perform any precautionary
checks on the production cluster.

Apart from firing a checkpoint, does "pg_start_backup()" updates any
dictionary tables or views ? or it updates anything in "pg_xlog"

Looking forward for your help !

Thanks
VB

Re: [GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji

Thanks Richard !

I realized that, I was restoring on an 32 bit server.

Regards,
Venkat

On Fri, Sep 23, 2011 at 6:59 PM, Richard Huxton  wrote:

> On 23/09/11 13:53, Venkat Balaji wrote:
>
>> Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT
>> 2010 x86_64 x86_64 x86_64 GNU/Linux
>>
>> The server version where I restored the production backup is as follows -
>> Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25
>> EDT 2010 i686 i686 i386 GNU/Linux
>>
>> I read some where that, Postgres datafiles are not architecture
>> independent.
>>
>
> They (the WAL files) are not, and it looks like you're trying to restore a
> 64-bit version onto a 32-bit server. That's not going to work. A
> pg_dump/restore works of course, and if you need replication then Slony can
> handle this.
>
> --
>  Richard Huxton
>  Archonet Ltd
>

Re: [GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji

Apologies -

I did not mention complete details of PG and OS -

Postgres 9.0.1
Production Operating System version where Postgres is running is as follows
-
Linux *prod-server* 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux

The server version where I restored the production backup is as follows -
Linux *backup-server* 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT
2010 i686 i686 i386 GNU/Linux

I read some where that, Postgres datafiles are not architecture
independent.

Please help !

Thanks
Venkat


On Fri, Sep 23, 2011 at 6:11 PM, Venkat Balaji wrote:

> Hello Everyone,
>
> I am testing the Online Backups of our production databases ( this is part
> of our disaster recovery plan ).
>
> After restoring the Online Backup, we tried to bring up the cluster and
> ended up with the following error -
>
> 2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL:  incorrect checksum in
> control file.
>
> Does this message mean, the Online Backup is corrupted or invalid ?
>
> Normally, we get recovery related error messages. This is the first time we
> are facing a problem like this.
>
> Please help as this is critical for us.
>
> Thanks
> Venkat
>

[GENERAL] : Checksum ERROR when restoring Online Backup

2011-09-23 Thread Venkat Balaji

Hello Everyone,

I am testing the Online Backups of our production databases ( this is part
of our disaster recovery plan ).

After restoring the Online Backup, we tried to bring up the cluster and
ended up with the following error -

2011-09-23 07:29:04 CDT [24092]: [1-1] FATAL:  incorrect checksum in control
file.

Does this message mean, the Online Backup is corrupted or invalid ?

Normally, we get recovery related error messages. This is the first time we
are facing a problem like this.

Please help as this is critical for us.

Thanks
Venkat

[GENERAL] Calculate Vacuum Metrics

2011-09-19 Thread Venkat Balaji

Hello Everyone,

I am in the process of scheduling a VACUUM FULL for our production databases
where in downtime is extremely critical.

Can someone please help me calculate the amount of free space (or free
pages) in the Table and Index (even after regular autovacuum or vacuum
analyze is performed).

This will help me calculate the amount of space that will be claimed after
the vacuum full is performed.

I am using the below query (which i got from Google) to calculate the wasted
space and bloats in the Table and Index.

SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint,
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
tbloat,
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
END,1) AS ibloat,
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta --
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
nullhdr%ma END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
rs.schemaname AND nn.nspname <> 'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC ;

Can i continue to use it ? or is there any other better way to get the info
?

Please help !

Thanks
Venkat

Re: [GENERAL] warm standby - apply wal archives

2011-09-18 Thread Venkat Balaji

Syncing just WAL archive directory every minute should not be a problem at
all (running rsync every minute for a data directory is not recommended).

As said earlier, we had configured warm standby for a db of size 2 TB and
wal archive generation was in 100s.

We did not encounter any issues in running an rsync job with a frequency of
even less than a minute. We made sure that
rsync job is running on standby server (we were pulling the wal archives to
standby).

1. compress the wal archives
2. rsync on standby site
3. uncompress on standby site

Thanks
Venkat

On Sat, Sep 17, 2011 at 6:06 PM, MirrorX  wrote:

> just another update since the system is up and running and one more
> question
> :p
>
> the secondary server is able to restore the wal archives practically
> immediately after they arrive. i have set a rsync cron job to send the new
> wals every 5 minutes. the procedure to transfer the files and to restore
> them takes about 30 seconds (the number of archives is about 20-30). i ve
> tried to set it to 2 minutes, and then the procedure takes about 20 seconds
> (both transfer and restoration) while i didnt notice any impact on the
> primary server (the procedure is initiated on the secondary server). what
> is
> your opinion about the time  interval that the cron job should run? i ve
> read many articles online indicating that rsync should not run every 1
> minute, but in my case isn't it different since it just syncs two folder
> containing only wals and not the whole disks? plus both folders on the
> servers are in different partitions.
> thx in advance for your insight
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4813659.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Has Pg 9.1.0 been released today?

2011-09-13 Thread Venkat Balaji

Yes. I would be excited to know if there is a possibility of multi-master
replication system on Postgres.

We will be soon using 9.1 Streaming replication.

Thanks
Venkat

On Tue, Sep 13, 2011 at 1:31 AM, Aleksey Tsalolikhin <
atsaloli.t...@gmail.com> wrote:

> Congratulations on the release of 9.1.0!
>
> Lots of great features, I for one can't wait to try out unlogged
> tables, that should help a lot in our environment.
>
> Now that you have streaming replication both async and sync, are you
> working on multi-master replication?  *excited*  Or what's the
> roadmap?
>
> Thanks again and keep up the great work!
>
> Aleksey
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] warm standby - apply wal archives

2011-09-06 Thread Venkat Balaji

Considering the size of WAL archives = 200GB

Compressing them using gzip (you can use this command in a shell script and
place it in archive_command as well) would possibly reduce the size to as
low as 10 - 20 GB.

Please let us know the results.

Thanks
Venkat

On Tue, Sep 6, 2011 at 1:03 PM, MirrorX  wrote:

> The network bandwidth between the servers is definitely not an issue. What
> is
> bothering me is the big size of the wal archives, which goes up to 200GB
> per
> day and if the standby server will be able to replay all these files. The
> argument that; since the master can do it and also do various other tasks
> at
> the same time, and since the secondary is identical to the first, so he
> should be able to do that seems valid, so i will give it a try and let you
> know about the results. In the meantime if there are any other
> ideas/suggestions etc please let me know. thx to all
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4773498.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] warm standby - apply wal archives

2011-09-05 Thread Venkat Balaji

In my experience, I had configured a warm standby for 2 TB Postgres Cluster
(PostgreSQL 8.4).

Note : I do not know your database size and WAL archive generation rate.

Important considerations i made were as follows -

1. WAL archives transfer from production to standy depends on the network
bandwidth (i think you said there is no issue there) and the size of the
WAL archives.
2. Transfer rate can be optimized by compressing the WAL files. Each WAL
file size would reduce to 2 - 3 MB from 16 MB (only in case of warm
standby. In streaming replication size would decrease to 7 or 6 MB),
which makes huge difference for the network bandwidth.

Compress the WAL archives at the production and transfer & uncompress the
WALs on standby.

I did this successfully.

Hope this helps !

Thanks
Venkat

On Tue, Sep 6, 2011 at 2:57 AM, MirrorX  wrote:

> the nodes communicate through 4Gbps ethernet so i dont think there is an
> issue there. probably some kind of misconfiguration of DRBD has occured. i
> will check on that tommorow. thx a lot :)
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/warm-standby-apply-wal-archives-tp4770567p4772126.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgresql-9.0.1 Recovery

2011-08-30 Thread Venkat Balaji

Thanks Craig !

Below  is what i did -

1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()

I believe the backup is valid because, i was able to bring up the cluster
without any issues (ofcourse with data loss).

+ve signs-

I am able to bring up the cluster with the Online backup, but, only with the
loss  of data.

-ve signs and things to be strongly foreseen while backup testing -

   - pg_clog files were not synced. I suspect they were being written at the
   time of backup. I might have tried to sync the data dir when pg_clog files
   were half filled.
   - Though the WAL Archives are there, Postgres is not trying to recover
   beyond the timestamp at which pg_clog was missing.
   - Even if i replace the missing pg_clog files (which i did), Postgres is
   asking for the corresponding wal archive files

Yes. What i learnt is that we need to ensure that all the pg_clog files must
be fully copied as on the backup time. We cannot afford to miss any of them.

Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer  wrote:

> On 30/08/2011 6:59 PM, Venkat Balaji wrote:
>
>> Hello Everyone,
>>
>> I have a situation here -
>>
>> I am trying to restore the production online backup and recover the same.
>>
>>  - I had initially rsynced (excluded pg_log) the data directory and the
>> tarred and zipped the same
>>
>
> Did you do that after pg_start_backup() or on a stopped database server?
>
> If you did it on a running database server without first running
> pg_start_backup(), your backup is invalid.
>
> Personally I like to take my base backups from an LVM snapshot of the
> datadir just to be extra safe. That isn't necessary, though, and a regular
> rsync or tar or whatever of a datadir after pg_start_backup() is fine.
>
> Remember to run pg_stop_backup() afterwards.
>
>
>   - I got an error "unable to read  from pg_clog location"
>> (file size is around 160K)
>>
>
> ... from PostgreSQL, when you tried to start it?
>
> What emitted that error message?
>
>
>  What i understood is that, rsync some how missed out on syncing the
>> files in "pg_clog"  so, i had manually coped the missing pg_clog file
>> from production and tried recovery.
>>
>
> That won't work. You need a consistent snapshot of all the files in the
> data dir. You cannot just mix and match copies taken at different times.
>
> For efficiency reasons PostgreSQL will recycle used clog files. You can't
> just copy a file over and hope that because it has the same name, it still
> contains the data you want.
>
> Your backup *failed* at the point where you got an incomplete copy of the
> data directory.
>
>
>  Do i need to get that particular wal archive which is before online
>> backup time ?
>>
>
> No, you need to get the missing clog files. If you cannot do that, try
> using pg_resetxlog, but be aware that that may lose transactions and can
> potentially cause corruption of tables and indexes.
>
>
>  By this experience what i understand is that Postgresql stores committed
>> and uncommited transactions in pg_xlog / wal archive files and
>> information (not the transaction data) about transaction commit status
>> is stored in pg_clog. Am I correct ?
>>
>
> That sounds right to me, but I don't know as much about how Pg stores
> things as I should.
>
>
>  I am in the process of designing a disaster recovery planner for our
>> productions systems.
>>
>
> Congratulations!
>
> Be extremely glad this didn't happen in a real recovery scenario. This is a
> marvellous example of why you should always test your backups - you actually
> did, and found a problem that would've been a critical issue if the backup
> were actually needed.
>
> --
> Craig Ringer
>

[GENERAL] Postgresql-9.0.1 Recovery

2011-08-30 Thread Venkat Balaji

Hello Everyone,

I have a situation here -

I am trying to restore the production online backup and recover the same.

 - I had initially rsynced (excluded pg_log) the data directory and the
tarred and zipped the same
 - SCP'd the tar to a different server and untarred and unzipped the same
 - I got an error "unable to read  from pg_clog location" (file
size is around 160K)

I have an backup as on Aug 24th and the pg_clog file which pg is unable to
read is as on Aug 5th.

I am aware that the size of the files in pg_clog is supposed to be 256K.

What i understood is that, rsync some how missed out on syncing the files in
"pg_clog", so, i had manually coped the missing pg_clog file from production
and tried recovery.

To my surprise, PG recovery had asked for the corresponding pg_xlog (wal
archive) file as on Aug 5th.

Is there a way to recovery this ?
Do i need to get that particular wal archive which is before online backup
time ?

I do have all the other files till Aug24th.

By this experience what i understand is that Postgresql stores committed and
uncommited transactions in pg_xlog / wal archive files and information (not
the transaction data) about transaction commit status is stored in pg_clog.
Am I correct ?

I am in the process of designing a disaster recovery planner for our
productions systems.

Version  - PG-9.0.1
OS- RHEL 5

Please advice !

Regards,
Venkat

Re: [GENERAL] heavy swapping, not sure why

2011-08-30 Thread Venkat Balaji

It is recommended to identify the processes using up high work_mem and try
to set work_mem to higher value at the session level.

I this case, all the connections using up maximum work_mem is the potential
threat. As said by Zoltan, work_mem is very high and shared_buffers as well.

Other considerations would be as following -

 - Allocated kernel memory settings (like shmmax and shmget etc..)
 - How much memory is used up by the system level processes (like root and
non-pg users)
 - It also depends on the database size and the amount of data being
accessed across CPUs and memory.
 - We need to ensure if unnecessary data is being read into the memory
   ( queries hitting non-vacuumed tables, slow performing queries,
unnecessary full table scans etc)

Regards,
Venkat

On Tue, Aug 30, 2011 at 3:30 PM, Boszormenyi Zoltan  wrote:

> Hi,
>
> 2011-08-29 22:36 keltezéssel, Lonni J Friedman írta:
> > ...  I read that
> > (max_connections * work_mem) should never exceed physical RAM, and if
> > that's accurate, then I suspect that's the root of my problem on
> > systemA (below).
>
> work_mem is process-local memory so
>
> (max_connections * work_mem)  < (physical RAM - shared_buffers)
>
> Some queries may allocate multiples of work_mem, too.
>
> Also, the kernel uses some memory for internal accounting, caching
> and you need to account for the process binary in memory.
>
> >   However, I'd like confirmation before I start
> > tweaking things, as one of these servers is in production, and I can't
> > easily tweak settings to experiment (plus this problem takes a few
> > weeks before swapping gets bad enough to impact performance).
> >
> > A few examples:
> >
> > 0) system A: 56GB RAM, running postgresql-8.4.8 with the following
> parameters:
> > maintenance_work_mem = 96MB
> > effective_cache_size = 40GB
> > work_mem = 256MB
> > wal_buffers = 16MB
> > shared_buffers = 13GB
> > max_connections = 300
>
> RAM (56GB) - shared_buffers (13GB) = 43GB
>
> which is less than
>
> work_mem * max_connections = 300 * 0.25GB = 75GB
>
> The system would start swapping before 43GB/0.25GB = 172 clients.
>
> > 1) system B: 120GB RAM, running postgresql-9.0.4 with the following
> parameters:
> > maintenance_work_mem = 1GB
> > effective_cache_size = 88GB
> > work_mem = 576MB
> > wal_buffers = 4MB
> > shared_buffers = 28GB
> > max_connections = 200
>
> Similarly:
>
> 120GB - 28GB = 92GB
>
> is less than
>
> work_mem * max_connections = 200 * 576MB = 112.5GB
>
> Also, if you run anything else on the machine then the system would start
> swapping much sooner than hitting max_connections number of clients.
>
> I would never set work_mem that high by default. 8 - 16MB is usually
> enough for the common case and you can set work_mem for special
> queries from the client and then reset it.
>
> Best regards,
> Zoltán Böszörményi
>
> --
> --
> Zoltán Böszörményi
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt, Austria
> Web: http://www.postgresql-support.de
> http://www.postgresql.at/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

57 matches

Mail list logo