Re: [GENERAL] What is better method to backup postgresql DB.

2010-10-27 Thread Simon Riggs
On Tue, 2010-10-26 at 14:27 -0400, Steeles wrote:

> new to postgresl. need to backup postgresql DB, which way is better to
> backup DB.
>  
> from training, I learned that we can backup the whole PGdata and other
> direcotries to acheive backup goal, originally I was plainned to
> schedule jobs to use pgdump to backup all DBs.
>  
> so which way is better for backup or there is other way to backup
> PGDB.

Backup is explained in a detailed chapter in the new book:
PostgreSQL Administration Cookbook, available from the link below.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_get_procdef ?

2010-10-27 Thread Marc Mamin
Hello,

I want to export a list of procedure definitions, which seems to be a
hard nut to crack :-(
A solution could be to use a combination of pg_dump and pg_restore, but
this also  requires some time investment.
It would be fine, if pg_dump could be more selective about the object to
select...

Does anyone have a handmade solution for a pg_get_procdef function.

I'm still on Postgres 8.3 ...

Thanks for any help,

Marc Mamin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_get_procdef ?

2010-10-27 Thread hubert depesz lubaczewski
On Wed, Oct 27, 2010 at 11:21:43AM +0200, Marc Mamin wrote:
> Hello,
> 
> I want to export a list of procedure definitions, which seems to be a
> hard nut to crack :-(
> A solution could be to use a combination of pg_dump and pg_restore, but
> this also  requires some time investment.
> It would be fine, if pg_dump could be more selective about the object to
> select...
> 
> Does anyone have a handmade solution for a pg_get_procdef function.
> 
> I'm still on Postgres 8.3 ...

pg_dump -Fc -s + pg_restore -l | grep + pg_restore -L will be fast.

if that's not good enough for you, do:
psql -E
\df+ function

and check the queries. getting function definition from there should be
simple.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advice needed on application/database authentication/authorization/auditing model

2010-10-27 Thread Dmitriy Igrishin
Hey Craig,

2010/10/27 Craig Ringer 

> On 27/10/10 04:49, Dmitriy Igrishin wrote:
> > Hey Tony,
> >
> > 2010/10/27 Tony Cebzanov  > >
> >
> > On 10/23/10 11:01 AM, Craig Ringer wrote:
> > > Yep. As for not explicitly mentioning "lower" roles when granting a
> > > higher role (ie "admin" isn't also a "user") - role inheritance.
> >
> > I knew about role inheritance, I just didn't know about the
> > pg_has_role() function for determining if a user has a role.  That's
> > helpful, but I really don't want to be hitting the database with a
> > pg_has_role() call for every time I want to check if a user should
> have
> > access to a certain page or function in my application.
> >
> > Why not? Performance? It's just one function call.
>
> It's potentially a fair bit more than that. It requires a new connection
> (tcp connection, backend startup, auth, etc) or borrowing one from a
> pool. If the  pool is server side there's still a tcp connection with
> the associated latency. Then there's a round trip for the query and
> result. Processing the result. etc. It's not trivial, especially if your
> client and server aren't co-located.
>
This applies to any arbitrary SQL command. I don't see the problem here.
Caching the privileges on the client side - is a good idea, but there is a
perennial problem that I see very clearly - cache invalidation.


> Like you, I'd suggest using information_schema for the job.
>
> --
> Craig Ringer
>
> Tech-related writing: http://soapyfrogs.blogspot.com/
>



-- 
// Dmitriy.


Re: [GENERAL] Why Select Count(*) from table - took over 20 minutes?

2010-10-27 Thread Merlin Moncure
On Tue, Oct 26, 2010 at 4:30 PM, Diego Schulz  wrote:
> On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon  wrote:
>> I am the only user on this system right now, and one table select count(*) 
>> took over 20 minutes:
>>
>> wikitags exists and has 58,988,656 records.
>>
>> Structure (in pascal) is:
>>
>>   quer.SQL.Add('create table '+DBTags+' (');
>>   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
>>   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
>>   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
>>   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
>>   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
>>   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
>>   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
>>   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
>>   quer.SQL.Add('   primary key(pagename, tagword, instances)');
>>   quer.SQL.Add(')');
>>
>> where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
>>
>> I have hung off indexes for each column, to resolve my previous 
>> "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - 
>> this table is a write once, read many... *never* update, nor delete.
>>
>> Any suggestions?
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> You could try something like what's suggested in this blog post:
> http://jakub.fedyczak.net/post/26
>
> I didn't actually tried it, but I think it should work ok.

Before you try that, you should ask yourself if you really need a 100%
accurate count.  A reasonable approximation is maintained via the
stats system (pg_class.reltuples) that will often do and is free.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Daniel . Crespo
Hi,

 

The quick question is:

How (if possible) can I do to get data (maybe through xlogs) from two
separate databases and merge them into one?

 

For those that want to know my situation, here it is:

 

I have two postgresql 9.0 that are working as master/slave using
streaming replication. At some point, the slave will become primary. As
most of you already know, I can do this by just creating a trigger file.
Everything is fine so far: when I create a trigger file, the slave
becomes writable, therefore becoming master.

 

My setup is using pgpool on top of 2 databases, as shown in the
following drawing:

 

Server A   Server B

+--+   +--+

| AP0  |   | AP1  |

|  |   |   |  |   |

|  V   |   |  V   |

|   pgpool0|---.   .---|---pgpool1|

|  |   |\ /|  |   |

|  V   | X |  V   |

| DB0 <|' '|---> DB1  |

+--+   +--+

  172.10.10.2172.10.10.3

 

If Server A is disconnected from the network, pgpool0 will not see DB1,
and pgpool1 will not see DB0. Therefore, pgpool0 will degenerate the
failed backend. (For those who don't know pgpool, it's just a database
pool that has the same interface as postgresql. The application thinks
it's talking to a postgres database)

 

In my case above, this is what will happen:

1. pgpool0 detects DB1 disconnection and issues a failover to DB0, which
is already Master. That is, nothing to do.

2. pgpool1 detects DB0 disconnection and issues a failover to DB1, which
is slave. That is, create trigger file on DB1. At this point, both DBs
are writable.

3. At this point, split-brain problem could raise. That is, AP0 could
potentially insert new valid records through pgpool0, as well as AP1
through pgpool1. This means, both DB0 and DB1 could potentially have
valid data.

 

I'm aware of the existence of techniques like STONITH, or heartbeat. The
problem is that AP* must be running and available in the network and
connected to a database, therefore can not be just shutdown.

 

So, the question would be: How can I do to merge data from DB0 and DB1
and make it available in the new master, whichever is chosen? Any ideas?

 

Thanks in advance,

-Daniel

 

 



Re: [GENERAL] DB become enormous with continuos insert and update

2010-10-27 Thread Vick Khera
On Tue, Oct 26, 2010 at 5:55 PM, John R Pierce  wrote:
> never do VACUUM FULL.   Rather, use CLUSTER to rebuild heavily used tables
> in order of the most frequently used key (typically the PK), however this
> requires a global table lock for the duration, so should only be used when
> your application is relatively inactive.

Another trick I like to use is to alter a table field from type
integer to type integer.  This causes pg to rewrite the table without
trying to do any sorting that cluster would imply, and results in
basically the same end result, and is MVCC safe to boot.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Vick Khera
On Wed, Oct 27, 2010 at 9:58 AM,   wrote:
> So, the question would be: How can I do to merge data from DB0 and DB1 and
> make it available in the new master, whichever is chosen? Any ideas?

Perhaps investigate bucardo for replication, as it is supposed to be
able to help in situations like this.  I think you will have to write
some policy so it knows how to resolve conflicting updates unless you
don't care which one wins.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Lennin Caro
--- On Wed, 10/27/10, Vick Khera  wrote:

From: Vick Khera 
Subject: Re: [GENERAL] How to merge data from two separate databases into one 
(maybe using xlogs)?
To: "pgsql-general" 
Date: Wednesday, October 27, 2010, 8:26 PM

On Wed, Oct 27, 2010 at 9:58 AM,   wrote:
> So, the question would be: How can I do to merge data from DB0 and DB1 and
> make it available in the new master, whichever is chosen? Any ideas?

Perhaps investigate bucardo for replication, as it is supposed to be
able to help in situations like this.  I think you will have to write
some policy so it knows how to resolve conflicting updates unless you
don't care which one wins.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

IMHO pgpool is the solution



  

Re: [GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Vick Khera
On Wed, Oct 27, 2010 at 4:37 PM, Lennin Caro  wrote:

> IMHO pgpool is the solution
>

How does that solve the problem of having two disconnected networks, each
thinking their DB is the master?


[GENERAL] Should PQconsumeInput/PQisBusy be expensive to use?

2010-10-27 Thread Michael Clark
Hello everyone.

I have been investigating the PG async calls and trying to determine whether
I should go down the road of using them.

In doing some experiments I found that using
PQsendQueryParams/PQconsumeInput/PQisBusy/PQgetResult produces slower
results than simply calling PQexecParams.
Upon some investigation I found that not calling PQconsumeInput/PQisBusy
produces results in line with PQexecParams (which PQexecParams seems to be
doing under the hood).

I profiled my test and found this calling stack:
(This is OS X 10.6)

lo_unix_scall
   recvfrom$UNIX2003
  recv$UNIX2003
 pqsecure_read
pqReadData
   PQconsumeInput
  .


This showed up as the hottest part of the execution by far.  This was a
pretty simple test of fetching 6000+ rows.

If I remove the PQconsumeInput/PQisBusy calls, which essentially makes the
code blocking this hot spot goes away.

Fetching 1000 rows goes from <.5 seconds to >3 seconds when I have the
PQconsumeInput/PQisBusy calls in.


I was wondering if maybe I am doing something wrong, or if there is a
technique that might help reduce this penalty?

Thanks in advance for any suggestions,
Michael.

P.S. here is a code snippet of what I am doing basically:
(please keep in mind this is just test code and rather simplistic...)

int send_result = PQsendQueryParams(self.db,
[sql UTF8String],
i,
NULL,
(const char *const *)vals,
(const int *)lens,
(const int *)formats,
kTextResultFormat);
int consume_result = 0;
int is_busy_result = 0;

while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
((is_busy_result = PQisBusy(self.db)) == 1) )
;

if (consume_result != 1)
NSLog(@"Got an error in PQconsumeInput");

PGresult* res = PQgetResult(self.db);
while (PQgetResult(self.db) != NULL)
NSLog(@"Oops, seems we got an extra response?");


Re: [GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Daniel . Crespo
> On Wed, Oct 27, 2010 at 4:37 PM, Lennin Caro 
wrote:
> IMHO pgpool is the solution
> 
> How does that solve the problem of having two disconnected networks,
each thinking their DB is the master?

The original question is how can I do to merge the data of the two
master databases. The answer I'm looking for is the answer to your
question as well. My solution does not solve that problem. I was
thinking about the possibility of shipping all xlogs of both databases
and putting them into the final master (one of them), and replay them to
have all data. Later, I would take care of the conflicts.

Any ideas?

Thanks,
-Daniel

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Should PQconsumeInput/PQisBusy be expensive to use?

2010-10-27 Thread Alex Hunsaker
On Wed, Oct 27, 2010 at 15:02, Michael Clark  wrote:
> Hello everyone.
> Upon some investigation I found that not calling PQconsumeInput/PQisBusy
> produces results in line with PQexecParams (which PQexecParams seems to be
> doing under the hood).

> (please keep in mind this is just test code and rather simplistic...)
>     int send_result = PQsendQueryParams(self.db,
>                                         [sql UTF8String],
>                                         i,
>                                         NULL,
>                                         (const char *const *)vals,
>                                         (const int *)lens,
>                                         (const int *)formats,
>                                         kTextResultFormat);
>     int consume_result = 0;
>     int is_busy_result = 0;
>
>     while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
>         ;

You really want to select() or equivalent here...  This basically is a
busy loop using 100% cpu; neither PQconsumeInput or PQisBusy do any
kind of sleeping...

Something like:
fd_set read_mask;
int sock = PQsocket(st->con);
FD_ZERO(&read_mask);
FD_SET(sock, &read_mask);

while(1)
{
  struct timeval tv = {5, 0};
  select(sock+1, &read_mask, NULL, NULL, &tv);
  PQconsumeInput(self.db)
  if(!PQisBusy(self.db))
break;
}

or something...

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Should PQconsumeInput/PQisBusy be expensive to use?

2010-10-27 Thread David Wilson
On Wed, Oct 27, 2010 at 5:02 PM, Michael Clark wrote:

>
> while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
> ;
>
>
> The problem with this code is that it's effectively useless as a test.
You're just spinning in a loop; if you don't have anything else to be doing
while waiting for responses, then this sort of calling pattern is always
going to be worse than just blocking.

Only do async if you actually have an async problem, and only do a
performance test on it if you're actually doing a real async test, otherwise
the results are fairly useless.

-- 
- David T. Wilson
david.t.wil...@gmail.com


Re: [GENERAL] Should PQconsumeInput/PQisBusy be expensive to use?

2010-10-27 Thread Tom Lane
Michael Clark  writes:
> In doing some experiments I found that using
> PQsendQueryParams/PQconsumeInput/PQisBusy/PQgetResult produces slower
> results than simply calling PQexecParams.

Well, PQconsumeInput involves at least one extra kernel call (to see
whether data is available) so I don't know why this surprises you.
The value of those functions is if your application can do something
else useful while it's waiting.  If the data comes back so fast that
you can't afford any extra cycles expended on the client side, then
you don't have any use for those functions.

However, if you do have something useful to do, the problem with
this example code is that it's not doing that, it's just spinning:

> while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
> ;

That's a busy-wait loop, so it's no wonder you're eating cycles there.
You want to sleep, or more likely do something else productive, when
there is no data available from the underlying socket.  Usually the
idea is to include libpq's socket in the set of files being watched
by select() or poll(), and dispatch off to something that absorbs
the data whenever you see some data is available to read.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to merge data from two separate databases into one (maybe using xlogs)?

2010-10-27 Thread Vick Khera
On Wed, Oct 27, 2010 at 5:19 PM,   wrote:
> thinking about the possibility of shipping all xlogs of both databases
> and putting them into the final master (one of them), and replay them to
> have all data. Later, I would take care of the conflicts.
>

Again, I recommend you look at Bucardo if you want multi-master replication.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] exceptionally large UPDATE

2010-10-27 Thread Ivan Sergio Borgonovo
I've to make large UPDATE to a DB.
The largest UPDATE involve a table that has triggers and a gin
index on a computed tsvector.
The table is 1.5M records with about 15 fields of different types.
I've roughly 2.5-3Gb of ram dedicated to postgres.

UPDATE queries are simple, few of them use join and mainly consist
of updating records from temporary tables that contains a very
similar structure to the target.

This updates are rare so I can afford to tune postgresql just for
this large update and then return to a more balanced configuration.
I can even afford to be the only user of the DB so responsiveness of
the application using the DB is not an issue. Duration of the update
is.

Anything I can tune in postgresql.conf to speed up the UPDATE?

I'm increasing maintenance_work_mem to 180MB just before recreating
the gin index. Should it be more?
The update should be monolithic and it is inside a single
transaction. Since I can afford to be the only user of the DB for a
while, is there anything I can tune to take advantage of it?
What else could I change to speed up the update?

The triggers recreate the tsvector. One of the component of the
tsvector is taken from a join table.

I'll surely drop the gin index and recreate it when everything is
over.
I'm not sure if it's a good idea to drop the triggers since I'll
have to update the tsvectr later and I suspect this will cause twice
the disk IO.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] exceptionally large UPDATE

2010-10-27 Thread Rob Sargent



Ivan Sergio Borgonovo wrote:

I've to make large UPDATE to a DB.
The largest UPDATE involve a table that has triggers and a gin
index on a computed tsvector.
The table is 1.5M records with about 15 fields of different types.
I've roughly 2.5-3Gb of ram dedicated to postgres.

UPDATE queries are simple, few of them use join and mainly consist
of updating records from temporary tables that contains a very
similar structure to the target.

This updates are rare so I can afford to tune postgresql just for
this large update and then return to a more balanced configuration.
I can even afford to be the only user of the DB so responsiveness of
the application using the DB is not an issue. Duration of the update
is.

Anything I can tune in postgresql.conf to speed up the UPDATE?

I'm increasing maintenance_work_mem to 180MB just before recreating
the gin index. Should it be more?
The update should be monolithic and it is inside a single
transaction. Since I can afford to be the only user of the DB for a
while, is there anything I can tune to take advantage of it?
What else could I change to speed up the update?

The triggers recreate the tsvector. One of the component of the
tsvector is taken from a join table.

I'll surely drop the gin index and recreate it when everything is
over.
I'm not sure if it's a good idea to drop the triggers since I'll
have to update the tsvectr later and I suspect this will cause twice
the disk IO.

thanks

  


Is there an inherent value in a single transaction for such an update? 
By that I mean Do all the updates actually pertain to a single event? 
Nice as it is to get a clean slate if the a single record has a problem, 
it's also nice when N-1 of N batches succeed in a realistic amount of 
time and you're left hunting for the problematic record in one Nth of 
the records.


Corollary: if you can afford to be the only user for a while perhaps you 
can afford to reload from dump if you need to get back to ground zero.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general