Re: [GENERAL] warnings about invalid "search_path"

2012-08-08 Thread Samba
Thanks Scott,

I'll try that out; I hope this would solve my problem...

Regards,
Samba


On Tue, Aug 7, 2012 at 8:30 PM, Scott Marlowe wrote:

> On Tue, Aug 7, 2012 at 7:15 AM, Samba  wrote:
> > Thanks Gabriele for those pointers,
> >
> > I could now narrow it down to two things:
> >
> > "system_data" user logging into other databases [one of those may be the
> > default 'postgres'] which does not have "system_data" schema
> > other users [like 'postgres'] logging into their own or even other
> databases
> > which does not have "system_data" schema.
> >
> > I did notice that we have added "system_data" schema to a few other users
> > [roles]  who are also supposed to login to the database containing
> > "system_data" schema and that is causing this side-effect of logging
> these
> > warning messages when users who have "system_data" in the search_path log
> > into other databases that do not have "system_data" schema.
> >
> > So, what i understand is needed for me is "how to add a schema (or
> multiple
> > schemas) in the search path for a database irrespective of whichever user
> > logs in to the database?"
> >
> > Could you explain how to add "schema(s) into search_path for a database
> > irrespective of the user logging-in?
>
> You can set  search path for a particular database:
>
> alter database xyz set search_path='abc','xyz';
>


Re: [GENERAL] warnings about invalid "search_path"

2012-08-07 Thread Samba
Hi all,

I now realize that the issue is indeed occurring when users who have
"system_data" in their search_path log in to other databases that does not
have that schema.


Could someone explain how to "add schema(s) into search_path for a database
[not to user/role] irrespective of whichever user logging-in"?


Thanks and Regards,
Samba

---

On Tue, Aug 7, 2012 at 7:50 PM, Albe Laurenz wrote:

> Samba wrote:
> > I'm seeing some weired errors in the postgres logs after upgrading
> > to postgres-9.1(.3) about the schema added by default to search patch
> >
> > WARNING:  invalid value for parameter "search_path": "system_data"
> > DETAIL:  schema "system_data" does not exist
> >
> > We do have a user named "system_data" and a schema with the same
> name...
> >
> > Apart from these warning messages, there is not other problem about it
>
> Maybe the warning is from a different database that does not have such
> a schema.
>
> Try to add %d to the log_line_prefix parameter.
>
> Yours,
> Laurenz Albe
>


Re: [GENERAL] warnings about invalid "search_path"

2012-08-07 Thread Samba
Thanks Gabriele for those pointers,

I could now narrow it down to two things:


   1. "system_data" user logging into other databases [one of those may be
   the default 'postgres'] which does not have "system_data" schema
   2. other users [like 'postgres'] logging into their own or even other
   databases which does not have "system_data" schema.

I did notice that we have added "system_data" schema to a few other users
[roles]  who are also supposed to login to the database containing
"system_data" schema and that is causing this side-effect of logging these
warning messages when users who have "system_data" in the search_path log
into other databases that do not have "system_data" schema.

So, what i understand is needed for me is "how to add a schema (or multiple
schemas) in the search path for a database irrespective of whichever user
logs in to the database?"

Could you explain how to add "schema(s) into search_path for a database
irrespective of the user logging-in?

Thanks and Regards,
Samba

===

On Tue, Aug 7, 2012 at 4:53 PM, Gabriele Bartolini <
gabriele.bartol...@2ndquadrant.it> wrote:

> Hi Samba,
>
>   first: do not worry, it is perfectly normal.
>
>
> On Tue, 7 Aug 2012 16:25:14 +0530, Samba  wrote:
>
>> Hi all,
>> I'm seeing some weired errors in the postgres logs after upgrading to
>> postgres-9.1(.3) about the schema added by default to search patch
>>
>>  WARNING:  invalid value for parameter "search_path": "system_data"
>> DETAIL:  schema "system_data" does not exist
>>
>> We do have a user named "system_data" and a schema with the same
>> name...
>>
>
> A schema is something that belongs to a database.
>
> You have just set the search_path for a specific user ("system_data") to
> include "system_data" (am I right?). However, a user can theoretically
> connect to any database on the instance.
>
> The error above is generated when you connect with that user
> ("system_data") to a database that does not have the "system_data" schema.
>
> Please let me have more information if my assumption were wrong.
>
> Cheers,
> Gabriele
>
> Tip/Note: by default, search_path is set to search in the "$user" and
> public schemas. Therefore, if you connect using the "system_data" user, you
> do not need to force searching in that schema.
>
> --
>  Gabriele Bartolini - 2ndQuadrant Italia
>  PostgreSQL Training, Services and Support
>  Gabriele.Bartolini@**2ndQuadrant.it - www.2ndQuadrant.it
>


Re: [GENERAL] select current_setting('transaction_isolation')

2012-06-02 Thread Samba
JDBC does not query the database for the transaction isolation level for
the current session/connection on its own unless you application [or even
it could be hibernate] queries the same by calling :
connection.getTransactionIsolation()
method.

I doubt even if Hibernate would do that since it does not need to query
each time what the transaction isolation level of the current connection is
and would most probably cache that info in each Session instance.

So, it would be better you verify if your application specific code is
relying on some particular transaction isolation level and is ensuring that
the every query run on the database is actually running under that
particular transaction isolation level.

Regards,
Samba



On Wed, May 30, 2012 at 5:35 AM, David Kerr  wrote:

> Howdy,
>
> I recently did a log_min_duration_statement=0 run on my app, and found
> ~3million copies of
> "select current_setting('transaction_isolation')"
>
> I'm a Java + Hibernate stack. Does anyone know if this is a Hibernate
> artifact? or a jdbc artifact?
> or something else (implicit to some query pattern or trigger)?
>
> Thanks
>
> Dave
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Disable Streaming Replication without restarting either master or slave

2012-06-01 Thread Samba
I understand the way things work currently, but would it not be more
convenient to let the replication be stopped and started from SQL commands
like:

On Master:

select pg_pause_streaming_replication(slave_fqdn);
select pg_resume_streaming_replication(slave_fqdn);


On Slave:

select pg_pause_streaming_replication();
select pg_pause_streaming_replication();


 If Cascaded Streaming Replication is confugured, then

On Slave:

select pg_start_streaming_replication(cascaded_slave_fqdn);
select pg_stop_streaming_replication(cascaded_slave_fqdn);



Or an OS executable [ either in the core or a contrib module] like

pg_replication -p primary_fqdn -s slave_fqdn -a pause
pg_replication  -p primary_fqdn -s slave_fqdn -a resume
pg_replication  -p primary_fqdn -s slave_fqdn -a setup


[ the last one would awesome; and i do not think that it is impossible, and
would be loved by one and all ]

Regards,
Samba



On Wed, May 30, 2012 at 12:07 AM, Fujii Masao  wrote:

> On Wed, May 30, 2012 at 2:38 AM, Michael Nolan  wrote:
> >
> >
> > -- Forwarded message --
> > From: Michael Nolan 
> > Date: Tue, May 29, 2012 at 1:37 PM
> > Subject: Re: [GENERAL] Disable Streaming Replication without restarting
> > either master or slave
> > To: Fujii Masao 
> >
> >
> >
> >
> > On Tue, May 29, 2012 at 1:15 PM, Fujii Masao 
> wrote:
> >>
> >> On Tue, May 29, 2012 at 10:17 PM, François Beausoleil
> >>  wrote:
> >
> >
> >>
> >> > Are per-chance looking for pg_xlog_replay_pause() and
> >> > pg_xlog_replay_resume() ?
> >>
> >> Those can pause and resume WAL replay in the standby, but not streaming
> >> replication. Even while WAL replay is being paused, WAL can be streamed
> >> from the master to the standby.
> >>
> >> Regards,
> >>
> >> --
> >> Fujii Masao
> >
> >
> > So, that means that the only ways to stop streaming replication are to
> stop
> > the slave server, to disable access to the master via the pg_hba.conf
> file
> > (requiring the master configs be reloaded) or to set the trigger file on
> the
> > slave to tell it to stop replicating the master.
> >
> > And if the master/slave are set to synchronous streaming replication,
> your
> > options are more limited, since the master has to know to stop waiting
> for
> > the synchronous slave to respond.
> >
> > Once the slave has gone out of asynchronous replication mode, wuld it be
> > possible to resume asynchronous replication by stopping the slave server,
> > removing the trigger file, and restarting it in asynchronous streaming
> > replication mode?  This would, at a minimum, depend on how many updates
> have
> > occurred on the master during the time streaming replication was disabled
> > and having all the WAL files available, right?
>
> You'd like to restart the *promoted* standby server as the standby again?
> To do this, a fresh base backup must be taken from the master onto
> the standby before restarting it, even if there has been no update since
> the standby had been promoted.
>
> Regards,
>
> --
> Fujii Masao
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Disable Streaming Replication without restarting either master or slave

2012-05-29 Thread Samba
Hi,


Is it possible stop/disable streaming replication without stopping or
restarting either master or slave servers?

Since stopping or restarting the postgres servers would involve complete
invalidation of the connection pool [Java/JEE app server pool] that may
take a few minutes before the application becomes usable, it would be great
if there is a way we can disable replication [for maintenance reasons like
applying patches or upgrades, etc].


Thanks and Regards,
Samba


Re: [GENERAL] Global Named Prepared Statements

2012-05-21 Thread Samba
If Stored Procedures are equivalent to prepared statements [ as far as
preparing the query plan is concerned], then what i'm looking for is
perhaps a Global Prepared Statements at the client/driver side.

Specifically, It wold be good if the JDBC driver prepares all the queries
for invoking stored procedures at once per JVM so that each connection need
not incur the cost of preparing [parsing and storing] those queries per
connection.

Thus we can put all the queries [stored procedure calls] at a single place,
and prepare those queries during boot of the server [or deployment of the
web application], and then execute those queries endless times by closing
just the resultset object while keeping the statement open for ever.

I know this is not form to discuss the JDBC related questions but put my
thoughts here to complete the question i raised. If folks think this idea
is valid then i will take it up with the JDBC Driver team.

Thanks and Regards,
Samba

=
On Tue, May 15, 2012 at 6:46 PM, Merlin Moncure  wrote:

> On Tue, May 15, 2012 at 1:21 AM, Martijn van Oosterhout
>  wrote:
> > On Tue, May 15, 2012 at 05:38:27AM +0530, Samba wrote:
> >> Hi,
> >>
> >> Does postgresql support Global Prepared Statements, which are prepared
> only
> >> once per server and not per every connection?
> >
> > As pointed out, no.
> >
> >> Problem with per-connection prepared statements is that the onus of
> >> preparing those statements for each connection lies with the client
> which
> >> makes those connections. Ideally, the performance of an application
> must be
> >> decided by the Server that hosts and not by the client that uses the
> >> service.
> >
> > How is this different from using CREATE FUNCTION to create a function
> > which has the desired effect?  This is a well understood and commonly
> > used paradigm.  When using a connection pooler any query plan caching
> > will happen automatically.
>
> this is not necessarily true, right?  for example, 'sql' language
> functions don't cache plans while plpgsql functions mostly (no
> EXECUTE) do.  other languages will typically have ability to save
> plans  (or not).  but yeah, functions generally cover this case quite
> nicely.
>
> i rarely use prepared statements anymore but if you're counting
> microseconds of latency for trivial queries, they still have a niche
> role...but to really see the benefit you'd want to be coding directly
> against the C api and making the appropriate calls (PQexecPrepared,
> etc).
>
> merlin
>


[GENERAL] Global Named Prepared Statements

2012-05-14 Thread Samba
Hi,

Does postgresql support Global Prepared Statements, which are prepared only
once per server and not per every connection?

I see a discussion about this in the pgsql-hacker archives but it does not
have any conclusion; further, that discussion also deviates a bit from my
question by proposing to cache any arbitrary statements when used too often
based on statistical analyses.

here is the original thread :
http://archives.postgresql.org/pgsql-hackers/2008-03/msg01228.php

I noticed that a similar feature request is made in mysql community as
well; link here: http://bugs.mysql.com/bug.php?id=38732

Problem with per-connection prepared statements is that the onus of
preparing those statements for each connection lies with the client which
makes those connections. Ideally, the performance of an application must be
decided by the Server that hosts and not by the client that uses the
service.

Hence, it would be great if from a connection C1  i can prepare the
statement:


PREPARE GLOBAL fooplan (int, text, bool, numeric) AS
INSERT INTO foo VALUES($1, $2, $3, $4);


And From Connections C2-Cn, I can execute the same statement with bind
parameters:


EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);

This would help DBA to define the important and costly but fine-tuned
queries and expose only the prepared statement names to the application
developers. This will avoid scenarios like the developers forgetting to
prepare all the required statements per each connection on the one hand and
to make sure that they do not try to prepare it again and again which would
be causing errors and if not handled properly may cause functionality to be
broken at unexpected places.

Rather, if one can prepare named statements globally at once and then reuse
them through the entire uptime of the server, would that not be a lot more
beneficial?

If it is observed that a particular prepared statement is not behaving
properly, then it can be deallocated and fixed and then prepared again.


I'm not that much sure whether such a feature is already implemented in
postgres or not hence posting it to general mailing list; if folks feel
that it ought to go to hackers list, then please guide me so.

Thanks and Regards,
Samba



<http://bugs.mysql.com/bug.php?id=38732>


Re: [GENERAL] Issue with rsync based incremental backup : postgres: startup process waiting for 0000000100000001000000D2

2012-05-09 Thread Samba
Thanks Mike,

It worked!

I had even setup streaming replication after doing incremental replication,
without needing to stop postgres on the primary server.

Here is the script i came up with :

#!/bin/bash
> if [ $# -ne 1 ]; then
> echo "you must specify the hostname of the backup server";
> exit 0;
> fi;
> BACKUP_SERVER=$1
> PGDATA=/var/lib/pgsql/data
> PGXLOG=$PGDATA/pg_xlog
> PGEXEC="sudo su -l postgres -s /bin/bash -c"
> RSYNC="rsync"
> OPTIONS="--archive --checksum --compress --progress"
> EXCLUDES="--exclude postmaster.pid --exclude postgresql.conf --exclude
> pg_hba.conf --exclude server.crt --exclude server.key"
> ROLLOVER=32
> SSH="ssh -q -o StrictHostKeyChecking=no -o BatchMode=yes $BACKUP_SERVER"
> REPLICATION_CHECK="$SSH ps aux | grep postgres | grep wal | grep receiver"
> #On BACKUP_SERVER
> if [ -n "$(service postgresql status | grep "pid[:blank:]*[0-9]*")" ];
> then
> $SSH "service postgresql stop"
> fi;
> #On PRIMARY
> echo "Running VACUUM"
> $PGEXEC "psql -c \"VACUUM FULL;\""
> echo "VACUUM completed"
>
> for f in $(ls -tr $PGXLOG | head -n ${ROLLOVER}); do
>  $RSYNC $OPTIONS $PGXLOG/$f $BACKUP_SERVER:$PGXLOG/
> done;
> $PGEXEC "psql -c \"SELECT pg_start_backup('incremental_backup',true);\""
> $RSYNC $OPTIONS $EXCLUDES --exclude pg_xlog $PGDATA $BACKUP_SERVER:$PGDATA
> $PGEXEC "psql -c \"SELECT pg_stop_backup();\""
> $RSYNC $OPTIONS $PGXLOG $BACKUP_SERVER:$PGXLOG
> $RSYNC $OPTIONS $PGXLOG $BACKUP_SERVER:$PGXLOG
> #On BACKUP_SERVER
> $SSH "service postgresql start"
> if [ -z "$(service postgresql status | grep "pid[:blank:]*[0-9]*")" ];
> then
> echo "Failed to start database on backup server"
> echo "Look into the postgres logs for more details"
> echo "exiting..."
> exit 1;
> fi;
> #need to improve this delay-check to wait until the backup server has
> finished recovery and started into streaming mode
> sleep 30
> if [ -n "$(${REPLICATION_CHECK})" ] ; then
> echo "SUCCESS in synching BACKUP_SERVER with the latest data from
> Primary";
> #On BACKUP_SERVER
>  $SSH "service postgresql stop"
> echo "Stopped the backup server in good state; it will get updated in the
> next scheduled incremental backup"
> else
> echo "FAILED to sync backup server with Primary";
> echo "Leaving the backup server running in the failed state for
> further debugging"
> exit 1;
> fi;
> exit 0;


I hope this would help others in need...

Thanks and Regards,
Samba

----
On Thu, May 3, 2012 at 11:55 PM, Michael Nolan  wrote:

>
>
> On Thu, May 3, 2012 at 11:49 AM, Samba  wrote:
>
>> Hi,
>>
>>
>> Please advise me if what i'm doing is makes sense and is an accepted
>> mechanism for taking backups or if there is any other procedure that i can
>> emplpoy to avoid unnecessarily archiving gigabytes of WAL logs which may be
>> growing many times the size of the actual data directory.
>>
>>
>> Thanks and Regards,
>> Samba
>>
>
> The problem is that rsync isn't copying all the xlog files created during
> the time the rsync is taking place, which is why it is complaining that
> there are files missing.
>
> There may be other logical flaws with your process as well.
>
> Something similar to the steps given in "Starting Replication with only a
> Quick Master Restart" as laid out in the wiki tutorial on binary
> replication might give you a way to make this work.  (You probably won't
> need the restart of the master, since you're not actually setting up
> replication, so you won't be changing the postgresql.conf file on your
> master.)
>
> This uses a two-step process.  First you copy all the files EXCEPT the
> ones on pg_xlog, then you copy those files, so you have a complete set.
>
> See http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
> --
> Mike Nolan
>


[GENERAL] Issue with rsync based incremental backup : postgres: startup process waiting for 0000000100000001000000D2

2012-05-03 Thread Samba
Hi,


I'm trying to make a periodic (daily) incremental backup using rsync but
noticing a message that postgres startup process is waiting on so and so
XLOG file indefinitely.

I intend to schedule a cron job to periodically execute this rsync backup
process by : a) stop postgres server on backup server b) rsync the data
directory of original server to backup server c) start postgres on
secondary d) ensure that postgres starts up with out any issue e) execute
some test cases so as to ensure that there no data integrity issues f)
execute more tests to ensure that data is valid from business logic
perspective.

One may ask why am i using rsync backup instead of the log shipping bases
streaming replication; well, my answer would be that our system would be
having a lot of updates and fewer additions and since the transactions
would be piling up for each update, the WAL files accumulate so much that
we would be needing to archive much more data than the actual changes.
Further, transferring such huge amount of logs is going to consume lot of
time and bandwidth. Instead, I suppose that with rsync based backup we just
need to sync the actual data files and need to worry about archiving the
older WAL files, thus saving on the time of transfer, bandwidth costs, and
WAL archive management.

I would like to know if syncing the $PGDATA directory with rsync would be
good enough for incremental backups or that would create
data integrity issues.

Everytime i run the rsync command, I could successfully start the backup
server and am able to see the latest data from the original server, but am
a bit worried about this message which may be hinting that the backup is
not complete and is missing some essential WAL files.

Is there any additional steps or rsync configuration options that i need to
specify in order to make this sync complete?

Please let me know what can be done to make the rsync process complete so
that the backup server does not complain about waiting for missing WAL
files.

Here are the steps i executed on master and slave:

On Backup Server
root@backup ~]#
root@backup ~]# service postgresql stop
Stopping postgresql service:   [  OK  ]
root@backup ~]#

On Original Server
root@original ~]#
root@original ~]# su -l postgres -s /bin/bash -c "psql -d test"
test=#
test=# SELECT pg_start_backup('rsync_backup');
test=#\q
root@original ~]#
root@original ~]# rsync --archive --checksum --compress --progress
 --exclude postgresql.conf --exclude pg_hba.conf --exclude server.crt
--exclude server.key /var/lib/pgsql/data/* root@backup.server
:/var/lib/pgsql/data/
root@original ~]# su -l postgres -s /bin/bash -c "psql -d test"
test=#
test=# SELECT pg_stop_backup();
test=#\q
root@original~]#

On Backup Server
root@backup ~]#
root@backup ~]# service postgresql start
Starting postgresql service:   [  OK  ]
root@backup ~]#
root@backup  ~]#  ps aux  | grep postgres
postgres 18210  5.3  1.5 1811408 88540 ?   S20:43   0:00
/usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 18220  0.0  0.0 115496  1144 ?Ss   20:43   0:00 postgres:
logger process
*postgres 18221  0.0  0.0 1812252 2328 ?Ss   20:43   0:00 postgres:
startup process   waiting for 0001000100D2*
postgres 18222  0.0  0.0 1812180 2280 ?Ss   20:43   0:00 postgres:
writer process
postgres 18223  0.0  0.0 117744  1388 ?Ss   20:43   0:00 postgres:
stats collector process
postgres 18337  0.6  0.1 1814308 8092 ?Ss   20:43   0:00 postgres:
avaya_system_app_user avmgmt 127.0.0.1(34207) idle
postgres 18406  0.4  0.1 1815792 9272 ?Ss   20:43   0:00 postgres:
avaya_system_app_user avmgmt 127.0.0.1(34217) idle
postgres 18477  0.0  0.1 1813684 6680 ?Ss   20:43   0:00 postgres:
avaya_sseuser avmgmt 127.0.0.1(34231) idle
root 18479  0.0  0.0  61160   728 pts/1R+   20:43   0:00 grep
postgres
root@backup ~]#

Please advise me if what i'm doing is makes sense and is an accepted
mechanism for taking backups or if there is any other procedure that i can
emplpoy to avoid unnecessarily archiving gigabytes of WAL logs which may be
growing many times the size of the actual data directory.


Thanks and Regards,
Samba


Re: [GENERAL] Indexing MS/Open Office and PDF documents

2012-03-15 Thread Samba
Word documents can be processed by Abiword into any msword document into
html, latex, postscript, text formats with very simple commands; i guess it
also exposes some api which can be integrated into document
parsers/indexers.

Spreadsheets can be processed by utilizing *ExcelFormat *library
http://www.codeproject.com/Articles/42504/ExcelFormat-Library

or * BasicExcel *library
http://www.codeproject.com/Articles/13852/BasicExcel-A-Class-to-Read-and-Write-to-Microsoft

Or even the GNU GNumeric project has some api to process spreadsheets which
can be used to extract text and index.

Code to extract text from PDF
http://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file


Overall, I guess there are bits and pieces available over the internet and
some dedicated efforts are needed to assemble those and develop into a
finished product, namely document indexer.

Wish you success!


On Fri, Mar 16, 2012 at 2:51 AM, dennis jenkins  wrote:

> On Thu, Mar 15, 2012 at 4:12 PM, Jeff Davis  wrote:
> > On Fri, 2012-03-16 at 01:57 +0530, alexander.bager...@cognizant.com
> > wrote:
> >> Hi,
> >>
> >> We are looking to use Postgres 9 for the document storing and would
> >> like to take advantage of the full text search capabilities. We have
> >> hard time identifying MS/Open Office and PDF parsers to index stored
> >> documents and make them available for text searching. Any advice would
> >> be appreciated.
> >
> > The first step is to find a library that can parse such documents, or
> > convert them to a format that can be parsed.
>
> I don't know about MS-Office document parsing, but the "PoDoFo" (pdf
> parsing library) can strip text from PDFs.  Every now and then someone
> posts to the podofo mailing list with questions related to extracting
> text for the purposes of indexing it in FTS capable database.  Podofo
> has excellent developer support.  The maintainer is quick to accept
> patches, verify bugs, add features, etc...   Disclaimer: I'm not a pdf
> nor podofo expert.  I can't help you accomplish what you want.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?

2012-03-13 Thread Samba
Excuse me if what i say below is nonsensical, for I haven't read much about
compression techniques and hence these ramblings are just out of common
sense.

I think the debate about level (row, page, file) of compression arises when
we strictly stick to the axioms of compression which require that all the
info that would be needed for decompression must also be presented in the
same compressed unit.

Can't we relax this rule a bit and separate out the compression-hints into
separate file, like the way we have a table data in one file and the
positional references [indexes] in another file? will it not eliminate this
dilemma about the boundaries of compression?

perhaps a periodic auto vacuum like compressor daemon can take up the job
of recompression to have the compression-hints updated as per the latest
data present in the file/page at that instant.

Regards,
Samba


Re: [GENERAL] Measuring replication lag time

2012-02-24 Thread Samba
> Comparing "pg_controldata" output on prod and standby might help you with
> this.
>

We do use this approach and it is pretty reliable and gives time lag up to
the granularity of checkpoint_timeout.



On Thu, Feb 23, 2012 at 11:51 AM, Stuart Bishop wrote:

> On Thu, Feb 23, 2012 at 2:58 AM, Greg Williamson
>  wrote:
> > Stuart Bishop shaped the aether to ask:
> >
> >> Hi.
> >>
> >> I need to measure how far in the past a hot standby is, async
> >> streaming replication.
> >
> > Not sure if this will help, but we are using repmgr <
> https://github.com/greg2ndQuadrant/repmgr>; it sets up a monitoring
> schema which we poll )see the "Monitoring and Testing" section ... study
> their source code some and see how they come up with lag times.
>
> Might help indeed. My existing solution already has a small daemon (I
> can't always query the Slony-I sl_status view fast enough for load
> balancing web requests, so I maintain a cache). But repmgr seems to
> cover other work I need to do to keep ops happy so something for me to
> look closer at.
>
>
> --
> Stuart Bishop 
> http://www.stuartbishop.net/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Streaming Replication Over SSL

2011-12-01 Thread Samba
Hi all,
I searched a lot to find if some one has written about this but could not
find any successful attempt, hence thought of posting it here.

setting the sslmode='require' in the 'primary_conninfo' parameter in the
recovery.conf file on standby server would make the standby server make an
SSL connection with the master server for streaming replication.

If we want to authenticate the master server before we fetch data from it,
then copy the CA certificate from the postgres server on master to
$PG_DATA/.postgresql directory as 'root.crt' and set the above mentioned
parameter to sslmode='verify-ca'.

complete string:
primary_conninfo='host=master port=5432 sslmode=require' or

primary_conninfo='host=master port=5432 sslmode=verify-ca'

However, I'm not sure how to confirm if the standby server is really making
the connection to master and fetching the XLOG contents over SSL. I tried
intercepting the traffic using wireshark but could not find any hint to
that effect; all it says is that the traffic is over tcp.

Can someone suggest any way to confirm that this setting would really make
streaming replication work of SSL?

Thanks and Regards,
Samba


Re: [GENERAL] How to restore the table space tar files created by pg_basebackup?

2011-12-01 Thread Samba
Hi Venkat,
I verified that the tablespaces are located in actual directories and not
any symbolic links. Another interesting thing is that the content in these
additional tar.gz files is already present in the base.tar.gz file.

Regards,
Samba

--
On Thu, Dec 1, 2011 at 11:29 AM, Venkat Balaji wrote:

>
> Do you have Tablespace directories with a softlink to the data directory ?
>
> Thanks
> VB
>
> On Wed, Nov 30, 2011 at 7:42 PM, Samba  wrote:
>
>> Hi all,
>>
>> I have taken a base backup of my master server using pg_basebackup
>> command as below:
>>  pg_basebackup -D /tmp/PostgresBackup/ -Ft -Z 9 -l masterbackup -h
>> localhost -U replication -w
>>
>> The above created 4 tar files, namely: 16394.tar.gz  16395.tar.gz
>>  16396.tar.gz  base.tar.gz
>>
>> I do know that my database contains 3 table spaces in addition to
>> pg_default and pg_global ( I guess, that is why it created those three
>> numbered tar.gz files, plus one base.tar.gz file ) and my master and
>> standby servers are identical by all means.
>>
>> Now, I'm not sure how can I restore these files on the standby server. I
>> could restore the base.tar.gz into the data directory on standby and the
>> streaming replication has started working properly. But I'm not sure what
>> to do with these additional numbered gz files which contains the same data
>> that is already contained in the base.tar.gz file.
>>
>> Can some one explain me what to do with these files? The documentation
>> for pg_basebackup does not mention this information, it just says that a
>> different variant of the command will fail if there are multiple table
>> spaces.
>>
>> Another related query is if  we can specify the name of the backup file
>> instead of leaving it to be base.tar.gz file.
>>
>> Thanks and Regards,
>> Samba
>>
>>
>>
>>
>>
>


[GENERAL] How to restore the table space tar files created by pg_basebackup?

2011-11-30 Thread Samba
Hi all,

I have taken a base backup of my master server using pg_basebackup command
as below:
 pg_basebackup -D /tmp/PostgresBackup/ -Ft -Z 9 -l masterbackup -h
localhost -U replication -w

The above created 4 tar files, namely: 16394.tar.gz  16395.tar.gz
 16396.tar.gz  base.tar.gz

I do know that my database contains 3 table spaces in addition to
pg_default and pg_global ( I guess, that is why it created those three
numbered tar.gz files, plus one base.tar.gz file ) and my master and
standby servers are identical by all means.

Now, I'm not sure how can I restore these files on the standby server. I
could restore the base.tar.gz into the data directory on standby and the
streaming replication has started working properly. But I'm not sure what
to do with these additional numbered gz files which contains the same data
that is already contained in the base.tar.gz file.

Can some one explain me what to do with these files? The documentation for
pg_basebackup does not mention this information, it just says that a
different variant of the command will fail if there are multiple table
spaces.

Another related query is if  we can specify the name of the backup file
instead of leaving it to be base.tar.gz file.

Thanks and Regards,
Samba


Re: [GENERAL] equivalent to "replication_timeout" on standby server

2011-11-04 Thread Samba
Thanks Fuji for that I hint...

I searched around on the internet for that trick and it looks like we can
make the Standby close its connection to the master much earlier than it
otherwise would;it is good for me now.

But still there seems to be two problem areas that can be improved over
time...

   - although both master(with replication_timeout)  and slave (with tcp
   timeout option in primary_conninfo parameter) closes the connection in
   quick time (based on tcp idle connection  timeout), as of now they do not
   log such information. It would be really helpful if such disconnects are
   logged with appropriate severity so that the problem can identified early
   and help in keeping track of patterns and history of such issues.
   -
   - Presently, neither master nor standby server attempts
   to resume streaming replication when they happen to see each other after
   some prolonged disconnect. It would be better if either master or slave or
   both the servers makes periodic checks to find if the other is reachable
   and resume the replication( if possible, or else log the message that a
   full sync may be required).


Thanks and Regards,
Samba

--
On Fri, Nov 4, 2011 at 7:25 AM, Fujii Masao  wrote:

> On Thu, Nov 3, 2011 at 12:25 AM, Samba  wrote:
> > The postgres manual explains the "replication_timeout" to be used to
> >
> > "Terminate replication connections that are inactive longer than the
> > specified number of milliseconds. This is useful for the primary server
> to
> > detect a standby crash or network outage"
> >
> > Is there a similar configuration parameter that helps the WAL receiver
> > processes to terminate the idle connections on the standby servers?
>
> No.
>
> But setting keepalive libpq parameters in primary_conninfo might be useful
> to detect the termination of connection from the standby server.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>


[GENERAL] equivalent to "replication_timeout" on standby server

2011-11-02 Thread Samba
Hi all,

The postgres manual explains the "replication_timeout" to be used to

"Terminate replication connections that are inactive longer than the
specified number of milliseconds. This is useful for the primary server to
detect a standby crash or network outage"


Is there a similar configuration parameter that helps the WAL receiver
processes to terminate the idle connections on the standby servers?

It would be very useful (for monitoring purpose) if the termination of such
an idle connection on either master or standby servers is logged with
appropriate message.

Could some one explain me if this is possible with postgres-9.1.1?


Thanks and Regards,
Samba


[GENERAL] Disable Streaming Replication ==> Stop WAL Sender on master and WAL receiver on Slave

2011-11-01 Thread Samba
Hi all,

I read in pg mail archives that postgresql-9.1 release added support or
pausing and resuming  (or disabling and enabling) streaming replication but
could not find any relevant configuration settings for achieving the same.

the closest that i could find so far are:

replication_timeout (integer) that stops replication connections that are
inactive (perhaps due to standby failure)
replication_timeout_action(enum) was mentioned in some mail archives but
not present in the current documentation.
pg_xlog_replay_pause() and pg_xlog_replay_resume() which just pauses the
recovery (or application of WAL contents) on the Standby but still let the
streaming replication continue (accumulate the WAL files on the Standby
with out applying those)

Interestingly, even in the documentation for pg_xlog_replay_pause, it was
written that

"If streaming replication is disabled, the paused state may continue
indefinitely without problem. While streaming replication is in progress
WAL records will continue to be received, which will eventually fill
available disk space, depending upon the duration of the pause, the rate of
WAL generation and available disk space"


But how do I disable streaming replication, with the effect that both WAL
sender on master and WAL receiver on slave gets stopped? Similarly, when I
enable streaming replication, the WAL sender and WAL receiver processes
would need to be started.

Could some one please explain the GUC settings or the SQL function calls
that help achieving this?


Thanks and Regards,
Samba


Re: [GENERAL] Backup Database Question

2011-10-06 Thread Samba
what about pg_rman ?

-
On Fri, Oct 7, 2011 at 1:20 AM, Raymond O'Donnell  wrote:

> On 06/10/2011 20:18, Carlos Mennens wrote:
> > On Thu, Oct 6, 2011 at 3:12 PM, John R Pierce 
> wrote:
> >>   /path/to/pg_dumpall | gzip > /path/to/pgbackup-$(date -I).sql.gz
> >
> > Thanks John. I've never written a script so do I just use 'Vim' to
> > open a new file and just paste the following line?
> >
> > #!/bin/bash
> > /usr/bin/pg_dumpall | gzip > /var/db_backup/pg_backup-$(date -I).sql.gz
> >
> > Is that all I need to do or is there more steps / data involved?
>
> I think you also need to make it executable:
>
>  chmod u+x my_backup_script
>
> ...or something like that.
>
> Ray.
>
>
> --
> Raymond O'Donnell :: Galway :: Ireland
> r...@iol.ie
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Streaming Replication: Observations, Questions and Comments

2011-08-25 Thread Samba
The problem with maintaining a separate archive is that one need to write
some additional scripts to periodically remove older log files from the
archive and that gets complicated with a setup having one master and
multiple slaves.

I think it is a better idea to club compression and clean up in the core
itself, may at a later release. A better approach to cleanup is that the
walsender process decides when to cleanup a particular logfile based on the
feedback from the all the registered slaves. If a slave is not reachable or
falls behind for too long, then that slave should be banned from the setup
(log the event in pg_replication.log ???). The replication status for each
slave can be maintained in something like pg_slave_replica_status catalog
table.

When it comes to compression, walsender can compress the each chunk of data
that it streams (increasing the streaming_delay may improve compression
ratio, hence a balance has to be struck between compression and
sustainable-data-loss-in-case-of-failure)

Although I could visualise this design would be much better than leaving it
to external utilities, I'm not that good at C language and hence only
proposing a design and not a patch. I hope my suggestion will be received in
good spirit.

Thanks and Regards,
Samba

PS:
I have wrongly stated that master server had to be restarted in case of long
disconnects, sorry that was not true. But I still feel that requiring
restart of standby server to resume replication should be avoided, if
possible.

And, I strongly feel that a breakage in replication must be logged by both
master server and  the concerned slave servers.

---
On Wed, Aug 24, 2011 at 11:03 PM, Alan Hodgson  wrote:

> On August 24, 2011 08:33:17 AM Samba wrote:
> > One strange thing I noticed is that the pg_xlogs on the master have
> > outsized the actual data stored in the database by at least 3-4 times,
> > which was quite surprising. I'm not sure if 'restore_command' has
> anything
> > to do with it. I did not understand why transaction logs would need to be
> > so many times larger than the actual size of the database, have I done
> > something wrong somewhere?
>
> If you archive them instead of keeping them in pg_xlog, you can gzip them.
> They compress reasonably well.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Streaming Replication: Observations, Questions and Comments

2011-08-24 Thread Samba
Hi all,

We have a postgres-9.0 streaming replication set up where we keep the WAL
segments on the master amounting to 10 GB so that we can survive longer
periods of disconnect between master and slave. We do not use any shared
storage space for archiving WAL logs. (the shared disk server may turn out
to be another point of failure, which we would want to avoid)

Here is our basic configuration parameters in :

postgresql.conf on master:
wal_keep_segments = 640  # previously 32   |  # in logfile segments, min 1,
16MB each
 archive_mode = on   # allows archiving to be done # (change
requires restart)
archive_command = 'cp -v %p /archives/data/pgsql/pg_xlog/%f'


postgresql.conf on slave:
wal_level = hot_standby
hot_standby = on

recovery.conf on slave:
standby_mode = 'on'
primary_conninfo = 'host=ip.add.ress port=5432 user=repman'
trigger_file = '/var/lib/pgsql/data/stop.replication'
restore_command = 'cp -i /archives/data/pgsql/pg_xlog/%f "%p"'

Master and Slave servers are sperated by thousands of miles and the network
bandwidth comprises just an ordinary 1 Mbps DSL line. Both the master server
and the slave server have the /archives partition mounted and synced with
csync2 between master and the slave systems. I'm not sure if this is the
correct way of configuring streaming replication, but I will explain what
worked for us and what we are still left wanting with:

Under heavy inserts/updates/deletes on the master (load generated by stored
procedures), we  noticed that the slave went far behind the master and
resulted into breakage of replication. Hence we changed from 32 log file
segments to 640, which corresponds to 10 GB so that we can  survive either
very heavy spikes of load or even a week's disconnect of the slave (although
alarms would be raised appropriately for the same effect).

One strange thing I noticed is that the pg_xlogs on the master have outsized
the actual data stored in the database by at least 3-4 times, which was
quite surprising. I'm not sure if 'restore_command' has anything to do with
it. I did not understand why transaction logs would need to be so many times
larger than the actual size of the database, have I done something wrong
somewhere?

Another interesting fact we noticed is that once the replication is broken
for some longer time ( walsender and walreceiver processes have died by this
time), we had to restart not only the slave server but also the master
server, which was quite strange. Shouldn't the master server start (if there
is none running) walsender process the moment it receives a request for
streaming? Similarly, why should the slave be restarted just to start
replication again? why can't these two processes be independently started
and stopped by the postmaster process as and when necessary as per the need
to replicate or not?

Another thing that I noticed was that the slave server has logged that it is
out of sync and hence closing replication but the master did not say
anything about this breakage of replication.

So summing up the above, I would like to have some pointers to understand
the following, which I think will benefit many others as well:

   - Do I need to 'archive' since I'm storing quite a significant number of
   logfile segments that can help sustain disconnect for almost a week?
   - Why did the xlog data accumulate so much as to be 3-4 times that of the
   actual size of the database growth during the same period?
   - why should we restart the postgres service on slave and/or master if
   we need to join a slave back into replication after a long disconnect?
   - why is the master not complaining about the loss of replication? (of
   course, slave did complain about the disconnect or its inability to continue
   accepting data from master on account difference in xlog location).

Some of the above might be because of wrong configuration, while some may be
give hints for future enhancements. I hope this will start a healthy
discussion on the areas where streaming replication needs to be improved and
strengthened.

Thanks and Regards,
Samba


[GENERAL] Statistics about Streaming Replication deployments in production

2011-07-28 Thread Samba
Hi all,
We, at Avaya India, have been using postgres for a few years and are very
happy with the stability and performance of the system. We would want to
utilise the newly released streaming replication feature to build a
master-(multiple)slave based geographically redundant setup . We ship to our
customers a product that stores its transactional data in postgres, and the
size of the data would be accumulating to some where around a couple of
hundred gigabytes over a period of time. it will have heavy read load and
average write load.

One concern that is being coined by the our management team is regarding the
relative stability and 'industrial-strength' of streaming replication.
Considering that this feature is just one year old, doubts are expressed
about

   - data integrity -- cancelled long running transactions on Primary must
   not be applied on the standby
   -  reliability -- what if the network link is broken or one of the pair
   got crashed when log-segments for a huge committed transaction are being
   sent from master top standby?
   -  guaranteed recovery (on failover) -- at any moment, one should be able
   to turn the standby into active and start using it (there should not be a
   scenario where master crashed and the slave could not be turned active)


On account of these, we thought it would be reassuring to our management
team if we can cite a few existing production deployments and their success
stories.

I think one year is sufficient time for any product/feature to be thoroughly
tested for all its strengths and weaknesses; so would it be too much to ask
the vast postgres customer base about their experiences with streaming
replication, the good, the bad; and perhaps the best and the ugly too? It
would be great if customers can give their identity (employer info) but not
necessary though.

Thanks and Regards,
Samba


Re: [GENERAL] [REPOST] plpgsql : looping over multidimensional array : getting NULL for subdimension

2010-12-02 Thread Samba
Thank you  Merlin for your answer,

   I tried that and is working as you said.
   But when I do
   other_array=some_array[1:2];
   I'm getting the entire
'{{samba,sarath,sreenivas},{samba,mukhesh,pavan}}';

   not just the second subarray, i.e. '{{samba,mukhesh,pavan}}'.

   Is there any way I can get only the corresponding slice of the array?

   otherwise I need to call this function multiple times from my java
client, once per each sub-array which I think will more expensive
   that doing it in just one stored procedure call.

Regards,
Samba



On Thu, Dec 2, 2010 at 2:04 AM, Merlin Moncure  wrote:

> On Wed, Dec 1, 2010 at 10:51 AM, Samba  wrote:
> > Hi all,
> >
> > I'm trying to loop over a multidimensional array and find if any of the
> > elements in a sub-dimensional array are matching some known criteria but
> > facing issue with NULL for the sub arrays.
> >
> > I have a data structure that looks like:
> >
> >some_array VARCHAR[][] :=
> > '{{samba,sarath,sreenivas},{samba,mukhesh,pavan}}';
> >
> > I'm trying to assign the individual sub arrays to other array elements
> like:
> >
> >   other-array VARCHAR[];
> >
> >   other_array=some_array[1];
> >
> > and I'm expecting to get '{samba,sarath,sreenivas}' for index 1 and
> > {samba,mukhesh,pavan} for index 2;
> > however, I'm getting NULL.
> >
> > Can some one explain the way I can assign subarrays to other array
> elements
> > plpgsql?
>
> There is no truly effective way to do that.  you can however slice the
> array which is almost as good:
> other_array=some_array[1:1];
> note this will give {{samba,sarath,sreenivas}}, not
> {samba,sarath,sreenivas}
>
> merlin
>


[GENERAL] plpgsql : looping over multidimensional array : getting NULL for subdimension

2010-12-01 Thread Samba
Hi all,

  I'm trying to loop over a multidimensional array and find if any of
the elements in a sub-dimensional array are matching some known criteria but
facing issue with NULL for the sub arrays.

I have a data structure that looks like:

   *some_array VARCHAR[][] :=
'{{samba,sarath,sreenivas},{samba,mukhesh,pavan}}'*;

I'm trying to assign the individual sub arrays to other array elements like:

  other-array VARCHAR[];

  other_array=some_array[1];

and I'm expecting to get '{samba,sarath,sreenivas}' for index 1 and
*{samba,mukhesh,pavan}
*for index 2;*
*however, I'm getting NULL.

Can some one explain the way I can assign subarrays to other array elements
plpgsql?

Thanks and Regards,
Samba


[GENERAL] [REPOST] plpgsql : looping over multidimensional array : getting NULL for subdimension

2010-12-01 Thread Samba
Hi all,

I'm trying to loop over a multidimensional array and find if any of the
elements in a sub-dimensional array are matching some known criteria but
facing issue with NULL for the sub arrays.

I have a data structure that looks like:

   *some_array VARCHAR[][] :=
'{{samba,sarath,sreenivas},{samba,mukhesh,pavan}}'*;

I'm trying to assign the individual sub arrays to other array elements like:

  other-array VARCHAR[];

  other_array=some_array[1];

and I'm expecting to get '{samba,sarath,sreenivas}' for index 1 and
*{samba,mukhesh,pavan}
*for index 2;*
*however, I'm getting NULL.

Can some one explain the way I can assign subarrays to other array elements
plpgsql?

Thanks and Regards,
Samba

PS;
Sorry for spamming but my earlier mail was having some issue with the
Subject line because of  resending stalled post.


Re: [GENERAL] Stalled post to pgsql-general

2010-12-01 Thread Samba
REPOST!

On Wed, Dec 1, 2010 at 8:34 PM,  wrote:

> Your message to pgsql-general has been delayed, and requires the approval
> of the moderators, for the following reason(s):
>
> The author (Samba )
>  is not a member of any of the restrict_post groups.
>
> If you do not wish the message to be posted, or have other concerns,
> please send a message to the list owners at the following address:
>  pgsql-general-ow...@postgresql.org
>
>
> -- Forwarded message --
> From: Samba 
> To: pgsql-general@postgresql.org
> Date: Wed, 1 Dec 2010 20:33:09 +0530
> Subject: plpgsql : looping over multidimensional array : getting NULL for
> subdimension
> Hi all,
>
>   I'm trying to loop over a multidimensional array and find if any of
> the elements in a sub-dimensional array are matching some known criteria but
> facing issue with NULL for the sub arrays.
>
> I have a data structure that looks like:
>
>*some_array VARCHAR[][] :=
> '{{samba,sarath,sreenivas},{samba,mukhesh,pavan}}'*;
>
> I'm trying to assign the individual sub arrays to other array elements
> like:
>
>   other-array VARCHAR[];
>
>   other_array=some_array[1];
>
> and I'm expecting to get '{samba,sarath,sreenivas}' for index 1 and 
> *{samba,mukhesh,pavan}
> *for index 2;*
> *however, I'm getting NULL.
>
> Can some one explain the way I can assign subarrays to other array elements
> plpgsql?
>
> Thanks and Regards,
> Samba
>
>